This guide shows how to install and run Metarank on a single machine using Docker. We will run the service, feed it with sample data and issue queries.
Prerequisites
Docker: , or Docker for Linux
Operating system: Linux, macOS, or Windows 10+
Architecture: x86_64. For M1, see
Memory: 2Gb dedicated to Docker
This guide is tested with Docker for linux v20.10.16, and docker image.
Getting the dataset
For the quickstart, we will use an open dataset and personalize a set of pre-computed movie recommendations based on visitor activity. The dataset is used to build a website and includes the following event types:
ls -l
total 172
drwxr-xr-x 2 user user 4096 Aug 23 14:24 .
drwxr-xr-x 81 user user 16384 Aug 23 14:24 ..
-rw-r--r-- 1 user user 2542 Aug 23 14:24 config.yml
-rw-r--r-- 1 user user 150264 Aug 23 14:24 events.jsonl.gz
run the dataset import process from the current directory,
First query
We're going to send a set of initial candidates for reranking into Metarank's REST API /rank endpoint for an xgboost model defined in config.yml. Let's take top-100 popular movies tagged as Sci-Fi, and ask Metarank to reorder them to maximize CTR:
Which looks like a diverse set of sci-fi movies with some generic non-personalized ranking, as we haven't sent any interaction events yet.
Sending visitor feedback
Metarank expects to receive impression events (what was displayed to the visitor) and interaction events (what the visitor did after seeing the listing).
Impression event contains only the items that were displayed to the user, so if your response is paginated, impression event will indicate only items from the current page.
In our case, the impression event is a set of top 12 movies from the previous /rank request, starting with Terminator 2 and ending with MIIB:
Let's send the same first ranking request with top-100 sci-fi movies we did before, and see how response will change after providing some visitor feedback:
Ranking was adjusted to a taste of the visitor, we can see that:
Men in Black 2 went to the top, as it has similar tags and actors.
other space movies also went up (like Armageddon), as they're also space-related.
both parts of Back to the Future went significantly down, as not related to past visitor's clicks.
What's next?
- metarank configuration file used in the , describing how to map visitor events to ML features. For your own dataset, you don't always need to write this file from scratch, Metarank can automatically try to deduce most typical feature mappings based on your dataset. See for details.
- a dump of historical visitor interactions used for ML training.
train the ,
start the on port 8080.
play with the contents of , enabling and disabling different features and see how ranking changes depending on used features.
generate your own set of , describing your use case.