Configuration
Metarank YAML config file contains the following sections:
Persistence: how feature data is stored
Models: which models should be trained and used in inference
Recommendations: a special section on recommendations serving
Features: how features are computed from events
API: network options for API
Data sources: where to read events from
Core: service options, like anonymous tracking and error reporting.
See the sample-config.yml file for a full working example.
Persistence
The "state" section describes how computed features and models are stored. Check Persistence configuration for more information. An example persistence conf block with comments:
state: # a place to store the feature values for the ML inference and the trained model
# Local memory
# A node-local in-memory storage without any persistence.
# Feature values and the trained model is stored in-memory.
# Suitable only for local testing, as in case of a restart it will loose all the data.
type: memory
# Remote redis, with persistence.
# Saves the computed features and trained model in a Redis instance.
# You can use remote or local Redis installation.
#type: redis
#host: localhost
#port: 6369
#format: binary # optional, default=binary, possible values: json, binary
# Metarank implements several optimization strategies when using Redis: caching and pipelining
# Check https://docs.metarank.ai/reference/overview/persistence#redis-persistence for more details
#cache: # optional
# maxSize: 4096 # size of in-memory client-side cache for hot keys, optional, default=4096
# ttl: 1h # how long should key-values should be cached, optional, default=1h
#pipeline: # optional
# maxSize: 128 # batch write buffer size, optional, default=128
# flushPeriod: 1s # buffer flush interval, optional, default=1s
# enabled: true # toggle pipelining, optional, default=true
# can be also overridden from environment variables, see the
# https://docs.metarank.ai/reference/overview/persistence#redis-persistence for details
#auth: # optional
# user: <username> # optional when Redis ACL is disabled
# password: <password> # required if Redis server is run with requirepass argument
# tls: # optional, defaults to disabled
# enabled: true # optional, defaults to false
# ca: <path/to/ca.crt> # optional path to the CA used to generate the cert, defaults to the default keychain
# verify: full # optional, default=full, possible values: full, ca, off
# full - verify both certificate and hostname
# ca - verify only certificate
# off - skip verification
#timeout: # optional, defaults to 1s for all sub-timeouts
# connect: 1s # optional, defaults to 1s
# socket: 1s # optional, defaults to 1s
# command: 1s # optional, defaults to 1s
Training
Metarank also computes a click-through data structure, which contains the following bits of information:
ranking: which items were presented to the visitor
interactions: what visitor did after seeing the ranking (like clicks, purchases and so on)
feature values, which were used to produce the ranking in the past.
These click-through events are essential for model training, as they're later translated into the implicit judgement lists for the underlying LambdaMART model:

Metarank has multiple ways of storing these click-throughs with different pros and cons:
Redis: no special configuration needed, it's possible to perform periodic ML model retraining by reading the latest click-through events from it. But it takes quite a lot of RAM and maybe costly in a case when you have millions of click-through events.
Discard: do not store click-through events at all.
Local dir: takes much less RAM (as ct's are not stored in redis), but you need to manage the directory containing the click-through files by yourself.
S3: like local file, but offloads data to an external block storage, suits well for Kubernetes deployments.
S3 click-through store can either use hardcoded AWS credentials from config (which is not good from security perspective), or fall-back to the ones defined in env variables.
Features
This section describes how to map your input events into ML features that Metarank understands. See Feature extractors for an overview of supported types.
For inspiration, you can use a ranklens feature configuration used for Metarank demo site.
Models
The "models" section describes ML models used for personalization. Check Supported ranking models for more information about ranking models. See also recommendations models overview
Inference
The "inference" section describes inference model configuration for search results re-ranking. Check the inference models section for more information.
API
The "api" section describes the Metarank API configuration. This section is optional and by default binds service to port 8080 on all network interfaces.
Data sources
The optional "source" section describes the source of the data, and by default expects you to submit all user feedback using the API. Check Supported data sources for more information.
Core
This optional section contains parameters related to the metarank service itself. Default setup:
Click-through joining
Metarank joins ranking and interaction events together into click-through chains, which are later used for machine learning model training.
As interactions are happening some time later than rankings, Metarank needs to keep a set of rankings in the buffer, awaiting all the interactions that may happen later.
This buffer policy is controlled by the following parameters:
core.clickthrough.maxSessionLength: after which time period the session should be considered finalized, so no more interactions are allowed to happen. Default values is 30m, as in Google Analytics.core.clickthrough.maxParallelSessions: how many parallel sessions may hang in buffer awaiting interactions. Default is 10k.
Anonymous usage analytics
By default, Metarank collects anonymous usage analytics to improve the tool. No IP addresses are being tracked, only simple counters track what parts of the service are being used.
It is very helpful to us, so please leave this enabled. Counters are sent to https://analytics.metarank.ai on each service startup.
We never share collected data with anyone else.
Data is stored for 1 year, and then removed.
Collector code running on is open-source: github.com/metarank/metarank-lambda-tracker
An example payload:
Error logging
We use Sentry for error collection. This behavior is enabled by default and can be disabled with core.tracking.errors: false. Sentry is configured with the following options:
Breadcrumbs are disabled: so it won't share parts of your console log with us.
PII tracking is disabled: no hostnames and IP addresses are included in the error message.
An example error payload is available in sample-error.json.
The whole usage logging and error reporting can be disabled also by setting an env variable to METARANK_TRACKING=false.
Last updated
Was this helpful?