Welcome to ir-metrics

ir-metrics is a python package that contains definition of the most common information retrieval metrics. The main goal of this project is to provide a common interface for various IR tasks.

Documentation

This part of the documentation shows the intended ways to use the package.

Basic Usage

The metrics are designed to work for array-like structures and integers:

>>> from irmetrics.topk import rr
>>> y_true = "apple"
>>> y_pred = ["banana", "apple", "grapes"]
>>> rr(y_true, y_pred)
0.5

The same function works also for the matrix-like structures:

>>> import numpy as np
>>> from irmetrics.topk import rr
>>> y_trues = np.repeat(y_true, 128)
>>> y_preds = np.repeat([y_pred], 128, axis=0)
>>> # Calculate the Mean Reciprocal Rank
>>> rr(y_trues, y_preds).mean()
0.5
>>> # Calculate the standard deviation for Reciprocal Ranks
>>> rr(y_trues, y_preds).std()
0.0

Using custom relevance judgements

All top-k metrics accept the relevance function as a parameter. This way it is possible to modify the behavior of the metrics. In case if each query has only a single positive label one can use irmetrics.relevance.unilabel to be more expressive:

>>> from irmetrics.topk import rr
>>> from irmetrics.relevance import unilabel
>>> y_true = "apple"
>>> y_pred = ["banana", "apple", "grapes"]
>>> rr(y_true, y_pred, relevance=unilabel)
0.5

This gives the same results as the default relevance function but is a (tiny) bit faster. Similarly, this mechanism allows adding arbitrary logic to the evaluation:

>>> from irmetrics.topk import rr
>>> from irmetrics.relevance import multilabel
>>> y_true = "apple"
>>> y_pred = ["banana", "apple", "grapes"]
>>> def irrelevance(y_true, y_pred):
...     return ~multilabel(y_true, y_pred)
>>> rr(y_true, y_pred, relevance=irrelevance)
1.0

Similarly this code can be adapted for inputs with multiple queries.

Using with pandas

The metrics are designed to work also with pandas dataframes:

>>> import numpy as np
>>> import pandas as pd
>>> from irmetrics.topk import rr
>>> # basic data
>>> y_true = "apple"
>>> y_pred = ["banana", "apple", "grapes"]
>>> n = 10
>>> # create the example dataframe by repeating entries n times
>>> df = pd.DataFrame({"y_true": [y_true] * n, "y_pred": [y_pred] * n})
>>> # calculate the MRR
>>> rr(df["y_true"], np.vstack(df["y_pred"])).mean()
0.5

Note that np.vstack is required here to convert y_pred to matrix. Quite often data is represented in long (or flat) format and only relevance judgements provided for each entry. There is a dedicated irmetrics.flat module created for that:

>>> import numpy as np
>>> import pandas as pd
>>> from irmetrics.flat import flat
>>> # example data
>>> df = pd.DataFrame({
...    "click": [0, 1, 0, 1, 0, 0],
...    "label": ["banana", "apple", "grapes", "bob", "rob", "don"],
...    "query_id": [0, 0, 0, 1, 1, 1]
... })
>>> df
   click   label  query_id
0      0  banana         0
1      1   apple         0
2      0  grapes         0
3      1     bob         1
4      0     rob         1
5      0     don         1
>>> # calculate the MRR
>>> flat(df, query_col="query_id", relevance_col="click", measure=rr)
query_id
0    0.5
1    1.0
Name: click, dtype: float64

In the example above, “label” column is provided just for illustration purposes and is ignored. Currently ir-metrics defines only ndcg and rr measures that are compatible with flat format.

Using with pyspark

The metrics are designed to work also with pyspark dataframes:

>>> import pandas as pd
>>> import pyspark.sql.functions as F
>>> from pyspark import SparkContext
>>> from pyspark.sql import SQLContext
>>> from irmetrics.topk import rr
>>> # basic data
>>> y_true = "apple"
>>> y_pred = ["banana", "apple", "grapes"]
>>> n = 10
>>> # create the example dataframe by repeating entries n times
>>> df = pd.DataFrame({
...     "y_true": [y_true] * n,
...     "y_pred": [y_pred] * n
... })
>>> # Create spark datasets
>>> spark = SQLContext(SparkContext())
>>> sdf = spark.createDataFrame(df)
>>> # apply the metrics
>>> sdf.withColumn("rr", F.udf(rr)("y_true", "y_pred")).show(5, False)
+------+-----------------------+---+
|y_true|y_pred                 |rr |
+------+-----------------------+---+
|apple |[banana, apple, grapes]|0.5|
|apple |[banana, apple, grapes]|0.5|
|apple |[banana, apple, grapes]|0.5|
|apple |[banana, apple, grapes]|0.5|
|apple |[banana, apple, grapes]|0.5|
+------+-----------------------+---+
only showing top 5 rows

Please note that ir-metrics should be installed at all workers in your cluster. Similarly, the flat module should also work with pandas UDFs.

API Reference

Information about specific functions.