Snowplow
Last updated
Was this helpful?
Last updated
Was this helpful?
Metarank can be integrated into existing setup.
We provide a set of that you will use to track metadata and interaction events that can later on be read from Snowplow's enriched event stream by Metarank.
are used to track Metarank-specific events.
Metarank will use Snowplow's enriched event stream as a source of events.
Typical Snowplow Analytics setup consists of the following parts:
Stream Collector writes all incoming events into the raw stream
Validated and enriched events are written to the enriched stream
Enriched events are delivered to the Analytics DB
Metarank exposes a set of Snowplow-compatible event schemas, and can read events directly from the enriched stream, as shown on the diagram below:
All incoming raw events have a strict JSON schema, that consists of the following parts:
unstructured payload with user-defined schema
multiple context payloads with user-defined schemas
There are four different Metarank event types with the corresponding schemas:
[stdout]: not supported
These events can be generated both on the frontend side, and on the backend side, depending on your setup and data availability on the front and back ends.
An example of tracking a ranking event:
It Often happens that the frontend doesn't have all the required information to generate events. A good example is item metadata event (usually tags, titles and price are altered in some back-office system and are not directly exposed to the frontend).
In this case you can generate such events on the backend side.
Metarank schemas are available on a public Iglu server on https://iglu.metarank.ai
. To use it, add the following snippet to the resolver.json
snowplow-enrich config file:
Both http
and https
schemas are supported, but https
is recommended.
An example loader integration diagram is shown below:
Snowplow is flexible enough to use different data loading destinations (Redshift, Postgres, Snowflake, S3, etc.), but to access both live and historical enriched event data, Metarank needs an access to:
Enriched event stream
Historical enriched stream dumps done with S3 Loader
At the moment Metarank supports loading historical events only from S3 Loader.
Snowplow enrich is usually configured with three destination streams output/pii/bad, with the same HOCON definition:
All the supported Metarank sources have an optional format
field, which defines the underlying format of the payload in this stream. Valid options are:
json
: default value, Metarank native format
snowplow
, snowplow:tsv
: Snowplow default TSV stream format
snowplow:json
: Snowplow optional JSON stream format
With the format: snowplow:tsv
, Metarank will read TSV events and transform them into native format automatically.
Using , your application emits a clickstream telemetry to the
validates these events according to the predefined schemas from the Schema Registry
predefined fields according to the
These user-defined schemas are pulled from the , and these schemas are standard definitions, describing the payload structure.
ai.metarank/item/1-0-0
:
ai.metarank/user/1-0-0
:
ai.metarank/ranking/1-0-0
:
ai.metarank/interaction/1-0-0
:
These schemas are describing native without any modifications.
Check out for more details about Metarank schemas.
Snowplow supports for event delivery:
: supported by Metarank
: supported by Metarank
: support is
: not supported
: not supported
Metarank needs to receive 4 types of , describing items, users and how users interact with items:
: like titles, inventory, tags
: country, age, location
: what items and in what order were displayed to a visitor
: how visitor interacted with the ranking
Using , you can track , which are JSONs with attached schema references.
Check out the and articles for details on fields, event types and their meaning.
Metarank schemas are language-agnostic and you can instrument your app using any supported for your favourite language/framework of choice.
For a sample Java backend application, you can track an item update event with the following code, using the :
Snowplow enrich emits processed records in into the downstream Kinesis/Pubsub/etc. topic. This topic is usually later monitored by "Loaders", like a Snowflake loader, or .
To make metarank connect to this stream, configure the in the following way:
offloads realtime enriched events to gzip/lzo compressed files on S3. Given the following :
You can instrument Metarank to load these GZIP-compressed event dumps for the bootstrapping process with the in the following way:
With Metarank configured to pick live events from the enriched stream, and historical events from the offloaded files in S3, it should be straightforward to do the usual routine of and Metarank.