Welcome to ir-metrics¶
ir-metrics is a python package that contains definition of the most common information retrieval metrics. The main goal of this project is to provide a common interface for various IR tasks.
Documentation¶
This part of the documentation shows the intended ways to use the package.
Basic Usage¶
The metrics are designed to work for array-like structures and integers:
>>> from irmetrics.topk import rr
>>> y_true = "apple"
>>> y_pred = ["banana", "apple", "grapes"]
>>> rr(y_true, y_pred)
0.5
The same function works also for the matrix-like structures:
>>> import numpy as np
>>> from irmetrics.topk import rr
>>> y_trues = np.repeat(y_true, 128)
>>> y_preds = np.repeat([y_pred], 128, axis=0)
>>> # Calculate the Mean Reciprocal Rank
>>> rr(y_trues, y_preds).mean()
0.5
>>> # Calculate the standard deviation for Reciprocal Ranks
>>> rr(y_trues, y_preds).std()
0.0
Using custom relevance judgements¶
All top-k metrics accept the relevance function as a parameter. This way it is possible to modify the behavior of the metrics. In case if each query has only a single positive label one can use irmetrics.relevance.unilabel to be more expressive:
>>> from irmetrics.topk import rr
>>> from irmetrics.relevance import unilabel
>>> y_true = "apple"
>>> y_pred = ["banana", "apple", "grapes"]
>>> rr(y_true, y_pred, relevance=unilabel)
0.5
This gives the same results as the default relevance function but is a (tiny) bit faster. Similarly, this mechanism allows adding arbitrary logic to the evaluation:
>>> from irmetrics.topk import rr
>>> from irmetrics.relevance import multilabel
>>> y_true = "apple"
>>> y_pred = ["banana", "apple", "grapes"]
>>> def irrelevance(y_true, y_pred):
... return ~multilabel(y_true, y_pred)
>>> rr(y_true, y_pred, relevance=irrelevance)
1.0
Similarly this code can be adapted for inputs with multiple queries.
Using with pandas¶
The metrics are designed to work also with pandas dataframes:
>>> import numpy as np
>>> import pandas as pd
>>> from irmetrics.topk import rr
>>> # basic data
>>> y_true = "apple"
>>> y_pred = ["banana", "apple", "grapes"]
>>> n = 10
>>> # create the example dataframe by repeating entries n times
>>> df = pd.DataFrame({"y_true": [y_true] * n, "y_pred": [y_pred] * n})
>>> # calculate the MRR
>>> rr(df["y_true"], np.vstack(df["y_pred"])).mean()
0.5
Note that np.vstack is required here to convert y_pred to matrix. Quite often data is represented in long (or flat) format and only relevance judgements provided for each entry. There is a dedicated irmetrics.flat module created for that:
>>> import numpy as np
>>> import pandas as pd
>>> from irmetrics.flat import flat
>>> # example data
>>> df = pd.DataFrame({
... "click": [0, 1, 0, 1, 0, 0],
... "label": ["banana", "apple", "grapes", "bob", "rob", "don"],
... "query_id": [0, 0, 0, 1, 1, 1]
... })
>>> df
click label query_id
0 0 banana 0
1 1 apple 0
2 0 grapes 0
3 1 bob 1
4 0 rob 1
5 0 don 1
>>> # calculate the MRR
>>> flat(df, query_col="query_id", relevance_col="click", measure=rr)
query_id
0 0.5
1 1.0
Name: click, dtype: float64
In the example above, “label” column is provided just for illustration purposes and is ignored. Currently ir-metrics defines only ndcg and rr measures that are compatible with flat format.
Using with pyspark¶
The metrics are designed to work also with pyspark dataframes:
>>> import pandas as pd
>>> import pyspark.sql.functions as F
>>> from pyspark import SparkContext
>>> from pyspark.sql import SQLContext
>>> from irmetrics.topk import rr
>>> # basic data
>>> y_true = "apple"
>>> y_pred = ["banana", "apple", "grapes"]
>>> n = 10
>>> # create the example dataframe by repeating entries n times
>>> df = pd.DataFrame({
... "y_true": [y_true] * n,
... "y_pred": [y_pred] * n
... })
>>> # Create spark datasets
>>> spark = SQLContext(SparkContext())
>>> sdf = spark.createDataFrame(df)
>>> # apply the metrics
>>> sdf.withColumn("rr", F.udf(rr)("y_true", "y_pred")).show(5, False)
+------+-----------------------+---+
|y_true|y_pred |rr |
+------+-----------------------+---+
|apple |[banana, apple, grapes]|0.5|
|apple |[banana, apple, grapes]|0.5|
|apple |[banana, apple, grapes]|0.5|
|apple |[banana, apple, grapes]|0.5|
|apple |[banana, apple, grapes]|0.5|
+------+-----------------------+---+
only showing top 5 rows
Please note that ir-metrics should be installed at all workers in your cluster. Similarly, the flat module should also work with pandas UDFs.