we can compute how different each item price compared to the median price across the whole ranking with the following configuration snippet:
For example, given the following item prices:
p1: price=100
p2: price=200
p3: price=250
p4: price=300
p5: price=220
So for a ranking [p1, p2, p3, p4, p5] we compute a median value of 220, and then compute the difference:
p1: price_diff=-120
p2: price_diff=-20
p3: price_diff=30
p4: price_diff=80
p5: price_diff=0
When you have a very long ranking, it's worth to consider limiting the amount of items taken into account, when computing median. When setting top=3, for the same set of items in the ranking event above, you'll get the median of 200:
p1: price_diff=-100
p2: price_diff=0
p3: price_diff=50
p4: price_diff=100
p5: price_diff=20
Diversification over string fields
This type of diversification can be useful to see how different your items over low-cardinality fields like tags, colors, sizes and categories. Both string and string[] field types are supported.
When all your inventory items have a field color like in an example below:
Then for a ranking below:
we can compute how frequently each color is presented in the result set with the following configuration snippet:
The difference algorithm builds tag frequencies over the ranking (so color -> count in our example above), and then computes relative intersection between tags of item and tag frequencies. An example:
given a frequency of {red: 50%, green: 30%, blue: 20%}
for an item having only red color, the score will be 50%.
for a red-blue item, the score will be 50%+20%=70%
- name: price_diff
type: diversity
source: item.price # only item.* fields are accepted
ttl: 90d # optional, when to expire tracked fields
top: 10 # optional, take only top-N items to compute the median
- name: color_diff
type: diversity
source: item.color # only item.* fields are accepted
ttl: 90d # optional, when to expire tracked fields
top: 10 # optional, take only top-N items to compute the histogram