Counters
Event count is a nice and simple signal to affect ranking. But implementation-wise it is quite tricky:
counters are constantly changing in time, so to do proper model training and backtesting, you need to maintain a historical view on counter values. For example, if you're counting clicks over products, you need to know the resulting number on each point in time when each click happened.
global and always incrementing counters may work well for user/session scoped things, but counting clicks over products requires time windowing, as having 100 clicks over the whole lifetime is completely different from having the same 100 clicks, but for yesterday.
In Metarank, there are two different types of counters implemented:
interaction_counter: a simple global always incrementing counter on interactions. Good for counting number of clicks within session, cart size and so on.
window_counter: when you need to time-frame all the events. Good for item popularities.
Interaction counter
Interaction counter is configured in a following way:
Refresh field can be useful for counters when you don't want to update the value used for inference frequently and want to limit the write throughput to feature store.
Windowed counter
Windowed counter has the same semantics as the interaction one, but is configured in a different way:
So this feature extractor will emit a group of following features:
clicks_7: 12
clicks_14: 34
clicks_30: 70
clicks_60: 124
These feature values will be updated at least every 60 seconds.
Window counters are implemented as a circular buffer of counters. There is an each separate counter for each time bucket, and there are as many time buckets as max window size. As counters are updated frequently, it can be computationally expensive to refresh them on each interaction, so it's usually worth it to limit the refresh rate to something reasonable like 10 minutes.
There is also a way to combine multiple windowed counters into a rate to make streaming computation of CTR/Conversion rates easier.
Rate
rate
feature extractor is useful to calculate things like conversion rate, click-through rate and other values where you need to divide one interaction counter by another. You can configure it in the following way:
In this example, we use a bottom: impression
type of interaction. It's a special synthetic interaction event generated by metarank over the items which were examined by the visitor. See click models for details.
As for current version 0.5.x, rate feature can be only scoped to an item.
Rate normalization
Rate value can be quite noisy for cases with low number of events. Consider these two CTRs:
impressions=100, clicks=50. CTR=50%
impressions=2, clicks=1. CTR=50%
Both of these CTRs have the same value, but have completely different confidence. Inspired by the talk An approach to modelling implicit user feedback from HaystackUS22 conference, Metarank can normalize per-item rate based on a global prior rate:
Where a
and b
can be defined as a normalization constants, where a / b
is proportional to the global prior CTR rate.
As an example, imagine that we have the following click statistics:
over all items: 100 impressions, 10 clicks.
over item A: 2 impressions, 1 click.
Non-normalized CTR for item A is 50%, which is way above an average 10% CTR for the whole inventory. But we can mix global CTR with per-item CTR:
To maintain a constant scaling factor w
, we can define normalized per-item CTR in the following way:
Then, our example above will look in a more reasonable way:
10
100
10
2
1
10.00%
50.00%
10.78%
1
100
10
2
1
10.00%
50.00%
16.67%
10
100
10
10
3
10.00%
30.00%
11.82%
10
20
5
10
3
25.00%
30.00%
26.00%
With this normalization approach, per-item CTR will have much fewer outlier values for cases with low number of clicks. Choosing the right value of w
coefficient is dependent on your dataset:
typical range is between 5 and 10
the more outlier CTRs you have, the higher
w
should bedefault value is 10
Field-scoped rates
Grouping by item field
When the scope: item.<field_name>
parameter is set, rate feature can aggregate counters per a specific item field value.
This is useful as a way to solve the cold-start problem, when there's not enough interactions made per item, but you can aggregate it per category/color/brand.
So in a case when your inventory has a field brand=cocacola
, you can compute CTR per each brand, using the following feature extractor:
normalization is also supported
the target aggregating field should have low cardinality.
item field should have a string type, arrays are not supported.
Grouping by ranking field
Even more, you can also scope rates per ranking field AND item id! This can be useful when you have a lot of interactions and want to compute CTR within each query.
So in a case if all your ranking events have a field query=cola
, you can compute item CTR within each query:
normalization is also supported
the target aggregating field should have low cardinality.
ranking field should have a string type, arrays are not supported.
Last updated