Changelog
In a human-readable format. For a technical changelog for robots, see github releases page. Check our blog for more detailed updates.
- a minor bugfix release
- fixed an important bug with dataset preparation (symptom: NDCG reported by the booster was higher than NDCG computed after the training) - prediction quality should go up a lot.
- print NDCG before and after reranking
- print statistics for mem usage after training
- fix for crash when using file-based clickthrough store
Upgrading: note that redis state format has a non backwards compatible change, so you need to reimport the data when upgrading.
- Local caching for state, the import should be 2x-3x faster.
- fix bug for a case when there is a click on a non-existent item.
cache.maxSize
for redis now disables client-side caching altogether. Makes Metarank compatible with GCP Memstore Redis.- fixed mem leak in clickthrough joining buffer.
- lower mem allocation pressure in interacted_with feature.
- interacted_with feature now supports string[] fields
- fixed a notorious bug with local-file click-through store.
- XGBoost LambdaMART impl now supports categorical encoding.
- linux/aarch64 support, so docker images on Mac M1 are OK.
- a ton of bugfixes
-
interacted_with
feature now has much less overhead in Redis, and supports multiple fields in a single visitor profile. - click-through events now can be stored in a file, and not inside Redis, also reducing the overall costs of running Metarank
- it is now possible to export lightgbm/xgboost-compatible datasets for further hyper-parameter optimization.
- bugfix: add explicit sync on /feedback api call
- bugfix: config decoding error for field_match over terms
- bugfix: version detect within docker was broken
- bugfix: issue with improper iface being bound in docker
Notable features:
- Rate normalization support, so having 1 click over 2 impressions is not resulting in a 50% CTR anymore.
Most notable improvements:
Highlights of this release are:
- added core.clickthrough.maxSessionLength and core.clickthrough.maxParallelSessions parameters to improve memory consumption
Highlights of this release are:
Highlights of this release are:
- We have updated the
validate
mode of the CLI, so you can use it to validate your data and configuration.
Highlights of this release are:
- Kubernetes support: now it's possible to have a production-ready metarank deployment in K8S
- Kinesis source: on par with Kafka and Pulsar
- Custom connector options pass-through
Metarank is a multi-stage and multi-component system, and now it's possible to get it deployed in minutes inside a Kubernetes cluster:
- Bootstrap, Upload and Update jobs can be run both locally (to simplify things up for small datasets) and inside the cluster in a distributed mode.
Kinesis Streams can also be used as an event source. It still has a couple of drawbacks compared with Kafka/Pulsar, for example, due to max 7 day data retention it cannot be effectively used as a permanent storage of historical events. But it's still possible to pair it with a AWS Firehose writing events to S3, so realtime events are coming from Kinesis, and historical events are offloaded to S3.
As we're using Flink's connector library for pulling data from Kafka/Kinesis/Pulsar, there is a ton of custom options you can tune for each connector. It's impossible to expose all of them directly, so now in connector config there is a free-form
options
section, allowing to set any supported option for the underlying connector.Example to force Kinesis use EFO consumer:
type: kinesis
topic: events
region: us-east-1
offset: latest
options:
flink.stream.recordpublisher: EFO
flink.stream.efo.consumername: metarank
Last modified 3mo ago