Links

Counters

Event count is a nice and simple signal to affect ranking. But implementation-wise it is quite tricky:
  • counters are constantly changing in time, so to do proper model training and backtesting, you need to maintain a historical view on counter values. For example, if you're counting clicks over products, you need to know the resulting number on each point in time when each click happened.
  • global and always incrementing counters may work well for user/session scoped things, but counting clicks over products requires time windowing, as having 100 clicks over the whole lifetime is completely different from having the same 100 clicks, but for yesterday.
In Metarank, there are two different types of counters implemented:
  • interaction_counter: a simple global always incrementing counter on interactions. Good for counting number of clicks within session, cart size and so on.
  • window_counter: when you need to time-frame all the events. Good for item popularities.

Interaction counter

Interaction counter is configured in a following way:
- name: click_count
type: interaction_count
scope: item // you can also count actions by a user/session
interaction: click
refresh: 60s // optional, how frequently we should update the value, 0s by default
ttl: 90d // optional, how long should we store this field if there were no updates
Refresh field can be useful for counters when you don't want to update the value used for inference frequently and want to limit the write throughput to feature store.

Windowed counter

Windowed counter has the same semantics as the interaction one, but is configured in a different way:
- name: clicks
type: window_count
interaction: click
scope: item
count: 10 // optional, take only the last 10 clicks performed by the user
bucket_size: 24h // make a counter for each 24h rolling window
windows: [7, 14, 30, 60] // on each refresh, aggregate to 1-2-4-8 week counts
refresh: 60s // optional, how frequently we should update the value, 0s by default
ttl: 90d // optional, how long should we store this field if there were no updates
So this feature extractor will emit a group of following features:
  • clicks_7: 12
  • clicks_14: 34
  • clicks_30: 70
  • clicks_60: 124
These feature values will be updated at least every 60 seconds.
Window counters are implemented as a circular buffer of counters. There is an each separate counter for each time bucket, and there are as many time buckets as max window size. As counters are updated frequently, it can be computationally expensive to refresh them on each interaction, so it's usually worth it to limit the refresh rate to something reasonable like 10 minutes.
There is also a way to combine multiple windowed counters into a rate to make streaming computation of CTR/Conversion rates easier.

Rate

rate feature extractor is useful to calculate things like conversion rate, click-through rate and other values where you need to divide one interaction counter by another. You can configure it in the following way:
- name: ctr
type: rate
# name of the feature extractor used as a dividend
top: click
# name of the feature extractor that is used as a divider
bottom: impression // it's a special synthetic event generated by metarank.
bucket: 24h
periods: [7,30]
scope: item # optional, default item, options: item, item.<field>, ranking.<field>
refresh: 1h # optional, how frequently we should update the value, 1h by default
ttl: 90d # optional, how long should we store this field if there were no updates
normalize: # optional, disabled by default
weight: 10
In this example, we use a bottom: impression type of interaction. It's a special synthetic interaction event generated by metarank over the items which were examined by the visitor. See click models for details.
As for current version 0.5.x, rate feature can be only scoped to an item.

Rate normalization

Rate value can be quite noisy for cases with low number of events. Consider these two CTRs:
  • impressions=100, clicks=50. CTR=50%
  • impressions=2, clicks=1. CTR=50%
Both of these CTRs have the same value, but have completely different confidence. Inspired by the talk An approach to modelling implicit user feedback from HaystackUS22 conference, Metarank can normalize per-item rate based on a global prior rate:
rate = (a + item_clicks) / (b + item_impressions)
Where a and b can be defined as a normalization constants, where a / b is proportional to the global prior CTR rate.
As an example, imagine that we have the following click statistics:
  • over all items: 100 impressions, 10 clicks.
  • over item A: 2 impressions, 1 click.
Non-normalized CTR for item A is 50%, which is way above an average 10% CTR for the whole inventory. But we can mix global CTR with per-item CTR:
item_ctr = (10 total clicks + 1 click on item A) / (100 total impressions + 2 impressions on A)
To maintain a constant scaling factor w, we can define normalized per-item CTR in the following way:
item_ctr = (w + item_clicks) / (w * (global_impressions / global_clicks) + item_impressions)
Then, our example above will look in a more reasonable way:
weight
global impressions
global clicks
item impressions
item clicks
global CTR
item CTR
normalized item CTR
10
100
10
2
1
10.00%
50.00%
10.78%
1
100
10
2
1
10.00%
50.00%
16.67%
10
100
10
10
3
10.00%
30.00%
11.82%
10
20
5
10
3
25.00%
30.00%
26.00%
With this normalization approach, per-item CTR will have much fewer outlier values for cases with low number of clicks. Choosing the right value of w coefficient is dependent on your dataset:
  • typical range is between 5 and 10
  • the more outlier CTRs you have, the higher w should be
  • default value is 10

Field-scoped rates

Grouping by item field

When the scope: item.<field_name> parameter is set, rate feature can aggregate counters per a specific item field value.
This is useful as a way to solve the cold-start problem, when there's not enough interactions made per item, but you can aggregate it per category/color/brand.
So in a case when your inventory has a field brand=cocacola, you can compute CTR per each brand, using the following feature extractor:
- name: ctr
type: rate
top: click
bottom: impression
bucket: 24h
periods: [7,30]
scope: item.brand
  • normalization is also supported
  • the target aggregating field should have low cardinality.
  • item field should have a string type, arrays are not supported.

Grouping by ranking field

Even more, you can also scope rates per ranking field AND item id! This can be useful when you have a lot of interactions and want to compute CTR within each query.
So in a case if all your ranking events have a field query=cola, you can compute item CTR within each query:
- name: ctr
type: rate
top: click
bottom: impression
bucket: 24h
periods: [7,30]
scope: ranking.query
  • normalization is also supported
  • the target aggregating field should have low cardinality.
  • ranking field should have a string type, arrays are not supported.