Generic
Sometimes it can be useful to get the word length of a field, especially for models designed to personalize different types of content. You can use the
word_count
feature extractor to get the length of a string field with the following config:- name: title_length
type: word_count
scope: item
field: item.title // must be a string
More advanced feature type, which can scale numerical feature using different methods. Example config for a static scaling with predefined min and max values and log transformation:
- name: price
type: relative_number
method:
type: minmax
min: 0
max: 100
field: price
source: item
scope: item
Supported methods:
- minmax: uses
min
andmax
fields to scale - log_minmax: uses
min
andmax
fields to scale, but the value is log-transformed before. - estimate_minmax: using a sample of latest
pool_size
events (sampled withsample_rate
rate), estimate min and max values used for scaling - estimate_histogram: using a sample of latest
pool_size
events (sampled withsample_rate
rate), use a histogram scaling overbucket_count
buckets. So for a price field from the example above, histogram scaling will translate absolute value into a percentile over a sampled pool of values.
Estimate methods are useful for rough scaling of values, when you cannot easily define min and max:
estimate_minmax
should be used when the value can be linearly scaled, and there are no outliersestimate_histogram
can handle skewed distributions and outliers, but has quantized output: there is onlybucket_count
possible output values.
Example config for an
estimate_histogram
:- name: price
type: relative_number
method:
type: estimate_histogram
pool_size: 100 // for a pool size of 100
sample_rate: 10 // we sample every 10th event in the pool
bucket_count: 5 // so value will be mapped to 0-20-40-60-80-100 percentiles
field: price
source: item
scope: item
Counts the number of items in a string or numerical list. Example:
- name: toggled_filters_count
type: list_size
field: filters
source: item
Last modified 1yr ago