# Reranking with cross-encoders

In this guide we will set up Metarank as a simple inference server for cross-encoder LLMs (Large Language Models). In other words, we will use an open-source cross-encoder model to reorder your search results in a zero-shot manner, without collecting any visitor feedback data. We will use a pre-trained `MS-MARCO MiniLM-L6-v2` cross-encoder from the [sentence-transformers](https://sbert.net) package.

## What are cross-encoders?

Cross-encoder LLM is a way to leverage the semantic power of neural network to reorder top-N matching documents for your query. You can think about it as asking ChatGPT the following question:

```
For a search query "crocs", reorder the following documents in the decreasing relevance order:
1. Crocs Jibbitz 5-Pack Alien Shoe Charms
2. Crocs Specialist Vent Work
3. Crocs Kids' Baya Clog
```

TODO: add ChatGPT answer

For typical reranking scenarios, cross-encoders (even in zero-shot modes) are much more precise compared to bi-encoders.

## Initial setup

Let's imagine you already have a traditional search engine running (like Elasticsearch, OpenSearch or SOLR), which already has a good [recall](https://en.wikipedia.org/wiki/Precision_and_recall) level - it retrieves all the relevant products, but it sometimes struggles with precision: there can be some false-positives, and the ranking is not perfect.

In this guide we will take top-N matching documents from your search engine, and re-rank them according to their semantic similarity with the query.

![reranking flow](https://754461178-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FP51TUyWn10Vg5Y0r7pvt%2Fuploads%2Fgit-blob-0bb1ad6b16931819627ca959e730e0f1579bb48c%2Freranking.png?alt=media)

### Importing data

We will use an [Amazon ESCI](https://github.com/amazon-science/esci-data) e-commerce dataset as a toy but realistic example: it has 1.7M real products that we can easily query with Elasticsearch. You can download the JSON-encoded version here: <https://github.com/shuttie/esci-s>.

Assuming that you have Elasticsearch service running on `http://localhost:9200`, you can import the complete dataset with the following Python script:

```python
import json
from elasticsearch import Elasticsearch

es = Elasticsearch(hosts="http://localhost:9200")

with open('esci.json', 'r') as f:
  for line in f.readlines():
    doc = json.loads(line.rstrip())
      if 'title' in doc:
        # index only title and asin fields
        es.index(index="esci", document={'title': doc['title'], 'asin': doc['asin']})
```

And then you can perform simple keyword searches over the data. For example, you can search for "crocs":

```bash
curl -XPOST -d @search.json -H "Content-Type: application/json" http://localhost:9200/esci/_search
```

Where the `search.json` looks like this:

```json
{
  "query": {
    "multi_match": {
      "query" : "crocs", "fields": ["title"]
    }
  },
  "fields": ["asin","title"]
}
```

For this search query, Elasticsearch returned 30 matching products, but we will take only top-10 of them for further reranking:

```json
{
  "took": 7,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 30,
      "relation": "eq"
    },
    "max_score": 10.184343,
    "hits": [
      {
        "_index": "esci",
        "_id": "17qoxocBFbzZBgn-7-iy",
        "_score": 10.184343,
        "_source": {
          "title": "Crocs Jibbitz 5-Pack Alien Shoe Charms | Jibbitz for Crocs",
          "asin": "B089YD2KK5"
        },
        "fields": {
          "asin": [
            "B089YD2KK5"
          ],
          "title": [
            "Crocs Jibbitz 5-Pack Alien Shoe Charms | Jibbitz for Crocs"
          ]
        }
      }
    ]
  }
}
```

## Metarank as an inference server

As we got our top-10 search results for reranking, we're now going to configure Metarank in inference mode for cross-encoders. This can be done with the following configuration file:

```yaml
inference:
  msmarco:
    type: cross-encoder
    model: metarank/ce-msmarco-MiniLM-L6-v2
```

After start-up, Metarank will expose it's HTTP API and you can query the `/inference` API endpoint to perform the reranking. See the [API Reference](https://docs.metarank.ai/reference/api#inference-with-llms) for details about payload format:

```bash
curl -XPOST -d @rerank.json -H "Content-Type: application/json" http://metarank:8080/inference/cross/msmarco
```

Where `rerank.json` request looks like:

```json
{"input": [
  {"query": "crocs", "text": "Crocs Jibbitz 5-Pack Alien Shoe Charms"},
  {"query": "crocs", "text": "Crocs Specialist Vent Work"},
  {"query": "crocs", "text": "Crocs Kids' Baya Clog"}
]}
```

Metarank will respond with a set of scores, corresponding to the similarity of each query-document pair:

```json
{"scores": [0.756001, 0.52617, 0.193747]}
```

## Cross-encoder latency

Note that due to the LLM inference happening for all document-query pairs, cross-encoders can be quite slow for large reranking windows:

```
Benchmark (batch)                  (model)  Mode  Cnt    Score    Error  Units
encode          1  ce-msmarco-MiniLM-L6-v2  avgt   30   12.298 ±  0.581  ms/op
encode         10  ce-msmarco-MiniLM-L6-v2  avgt   30   58.664 ±  2.064  ms/op
encode        100  ce-msmarco-MiniLM-L6-v2  avgt   30  740.422 ± 13.369  ms/op
```

As it can be seen from the benchmark above, windows of top-100 products may incur a noticeable latency, so try to keep this reranking window reasonably small.
