# Diversification

## diversity

Computes how different your current ranking item is compared to other items within the same ranking. Numeric and string fields are supported.

### Diversification over numeric fields

Consider that all items in your inventory have a numeric `price` field:

```json
{
  "event": "item",
  "id": "81f46c34-a4bb-469c-8708-f8127cd67d27",
  "item": "item1",
  "timestamp": "1599391467000",
  "fields": [{"name": "price", "value": 69.0}]
}
```

Then for a ranking below:

```json
{
  "event": "ranking",
  "id": "81f46c34-a4bb-469c-8708-f8127cd67d27",
  "timestamp": "1599391467000",
  "user": "user1",
  "session": "session1",
  "items": [
    {"id": "item1"},
    {"id": "item2"},
    {"id": "item3"} 
  ]
}
```

we can compute how different each item price compared to the median price across the whole ranking with the following configuration snippet:

```yaml
- name: price_diff
  type: diversity
  source: item.price # only item.* fields are accepted
  ttl: 90d # optional, when to expire tracked fields
  top: 10 # optional, take only top-N items to compute the median
```

For example, given the following item prices:

* p1: price=100
* p2: price=200
* p3: price=250
* p4: price=300
* p5: price=220

So for a ranking `[p1, p2, p3, p4, p5]` we compute a median value of 220, and then compute the difference:

* p1: price\_diff=-120
* p2: price\_diff=-20
* p3: price\_diff=30
* p4: price\_diff=80
* p5: price\_diff=0

When you have a very long ranking, it's worth to consider limiting the amount of items taken into account, when computing median. When setting `top=3`, for the same set of items in the ranking event above, you'll get the median of 200:

* p1: price\_diff=-100
* p2: price\_diff=0
* p3: price\_diff=50
* p4: price\_diff=100
* p5: price\_diff=20

### Diversification over string fields

This type of diversification can be useful to see how different your items over low-cardinality fields like tags, colors, sizes and categories. Both `string` and `string[]` field types are supported.

When all your inventory items have a field `color` like in an example below:

```json
{
  "event": "item",
  "id": "81f46c34-a4bb-469c-8708-f8127cd67d27",
  "item": "item1",
  "timestamp": "1599391467000",
  "fields": [{"name": "color", "value": "red"}]
}
```

Then for a ranking below:

```json
{
  "event": "ranking",
  "id": "81f46c34-a4bb-469c-8708-f8127cd67d27",
  "timestamp": "1599391467000",
  "user": "user1",
  "session": "session1",
  "items": [
    {"id": "item1"},
    {"id": "item2"},
    {"id": "item3"} 
  ]
}
```

we can compute how frequently each color is presented in the result set with the following configuration snippet:

```yaml
- name: color_diff
  type: diversity
  source: item.color # only item.* fields are accepted
  ttl: 90d # optional, when to expire tracked fields
  top: 10 # optional, take only top-N items to compute the histogram
```

The difference algorithm builds tag frequencies over the ranking (so `color -> count` in our example above), and then computes relative intersection between tags of item and tag frequencies. An example:

* given a frequency of {red: 50%, green: 30%, blue: 20%}
* for an item having only red color, the score will be 50%.
* for a red-blue item, the score will be 50%+20%=70%


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.metarank.ai/reference/overview/feature-extractors/diversity.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
