API¶
This this is the full API reference of all public methods.
Metrics¶
-
irmetrics.topk.ap(y_true, y_pred, k=None, relevance=<function multilabel>)¶ Compute Average Precision score(s). AP is an aproximation of the integral over PR-curve.
- Parameters
- y_truescalar, iterable or ndarray of shape (n_samples, n_labels)
True labels of entities to be ranked. In case of scalars
y_predshould be of shape (1, n_labels).- y_prediterable, ndarray of shape (n_samples, n_labels)
Target labels sorted by relevance (as returned by an IR system).
- kint, default=None
Only consider the highest k scores in the ranking. If None, use all outputs. The minimum between the nuber of correct answers and k will be used to compute the score.
- relevancecallable, default=topk.relevance.multilabel
A function that calculates relevance judgements based on input
y_predandy_true.
- Returns
- apfloat
The average precision for a given sample.
References
Wikipedia entry for Mean Average Precision
Examples
>>> from irmetrics.topk import ap
for ground-truth labels related to a query:
>>> y_true = 1
and the predicted labels by an IR system:
>>> y_pred = [1, 0, 0] >>> ap(y_true, y_pred) 0.3333333333333333 >>> # This should be fixed >>> y_true = [1, 4, 5]
and the predicted labels by an IR system:
>>> y_pred = [1, 2, 3, 4, 5] >>> ap(y_true, y_pred) array([0.2, 0. , 0. ])
-
irmetrics.topk.dcg_score(relevance, k=None, weights=1.0)¶ Compute Discounted Cumulative Gain score(s) based on relevance judgements provided.
This is provided as internal implementation for ndcg for this reason the API for this function slightly differ: it alawyas accepts and outputs np.arrays, unlike other methos in this module.
- Parameters
- relevanceiterable or ndarray of shape (n_samples, n_labels) or simply
(n_labels,). The last dimension of the parameter is used as position. The relevance judgements provided by experts.
- weightsdefault=1.0, scalar, iterable or ndarray of shape (n_samples,)
takes into account the importance of each sample, if relevant.
- kint, default=None
Only consider the highest k scores in the ranking. If None, use all outputs.
- relevancecallable, default=topk.relevance.multilabel
A function that calculates relevance judgements based on input
y_predandy_true.
- Returns
- dcgnp.array
The discounted cumulative gains for samples (or a single sample).
References
Wikipedia entry for Discounted cumulative gain
Examples
>>> from irmetrics.topk import dcg_score
for ground-truth labels related to a query:
>>> relevance_judgements = np.array([[1, 0, 0, 0]]) >>> dcg_score(relevance_judgements) array([1.]) >>> relevance_judgements = np.array([[True, False, False, False]]) >>> dcg_score(relevance_judgements) array([1.]) >>> relevance_judgements = np.array([[False, True, False, False]]) >>> dcg_score(relevance_judgements) array([0.63092975])
-
irmetrics.topk.ndcg(y_true, y_pred, k=None, relevance=<function multilabel>, weights=1.0)¶ Compute Normalized Discounted Cumulative Gain score(s) based on relevance judgements provided.
- Parameters
- y_trueiterable or ndarray of shape (n_samples, n_labels) or simply
(n_labels,). The last dimension of the parameter is used as position.
- y_prediterable, ndarray of shape (n_samples, n_labels)
Target labels sorted by relevance (as returned by an IR system).
- kint, default=None
Only consider the highest k scores in the ranking. If None, use all outputs.
- weightsfloat, iterable, ndarray, default=1.0
Represents the weights of each sample.
- relevancecallable, default=topk.relevance.multilabel
A function that calculates relevance judgements based on input
y_predandy_true.
- Returns
- ndcgnp.array
The discounted cumulative gains for samples (or a single sample).
References
Wikipedia entry for normalized discounted cumulative gain
Examples
>>> from irmetrics.topk import ndcg
for ground-truth labels related to a query:
>>> y_true = [1, 2] >>> y_pred = [0, 1, 0, 0] >>> ndcg(y_true, y_pred) 0.6309297535714575 >>> # the order of y_true labels doesn't matter >>> y_true = [2, 1] >>> y_pred = [0, 1, 0, 0] >>> ndcg(y_true, y_pred) 0.6309297535714575
-
irmetrics.topk.precision(y_true, y_pred=None, k=None, relevance=<function multilabel>)¶ Compute Recall(s). and 1 otherwise. Check which fraction of
y_predis iny_true. NB: When passingy_predof shape [n_samples, n_outputs] the result is quivalent to recall(y_pred, y_true) / n_outputs.- Parameters
- y_truescalar, iterable or ndarray of shape (n_samples, n_labels)
True labels of entities to be ranked. In case of scalars
y_predshould be of shape (1, n_labels).- y_prediterable, ndarray of shape (n_samples, n_labels)
Target labels sorted by relevance (as returned by an IR system).
- kint, default=None
Only consider the highest k scores in the ranking. If None, use all outputs.
- relevancecallable, default=topk.relevance.multilabel
A function that calculates relevance judgements based on input
y_predandy_true.
- Returns
- rrbool in [True, False]
The relevances for all samples.
References
Wikipedia entry for precision and recall
Examples
>>> from irmetrics.topk import recall
for ground-truth labels related to a query:
>>> y_true = 1
and the predicted labels by an IR system:
>>> y_pred = [0, 1, 4, 3] >>> precision(y_true, y_pred) 0.25
-
irmetrics.topk.recall(y_true, y_pred=None, k=None, relevance=<function multilabel>)¶ Compute Recall(s). Check if at least one metric proposed in
y_predis iny_true. This is the binary score, 0 – all predictionss are irrelevant and 1 otherwise. This definition of recall is equivalent to accuracy@k.- Parameters
- y_truescalar, iterable or ndarray of shape (n_samples, n_labels)
True labels of entities to be ranked. In case of scalars
y_predshould be of shape (1, n_labels).- y_prediterable, ndarray of shape (n_samples, n_labels)
Target labels sorted by relevance (as returned by an IR system).
- kint, default=None
Only consider the highest k scores in the ranking. If None, use all outputs.
- relevancecallable, default=topk.relevance.multilabel
A function that calculates relevance judgements based on input
y_predandy_true.
- Returns
- rrbool in [True, False]
The relevances for all samples.
References
Wikipedia entry for precision and recall
Examples
>>> from irmetrics.topk import recall
for ground-truth labels related to a query:
>>> y_true = 1
and the predicted labels by an IR system:
>>> y_pred = [0, 1, 4] >>> recall(y_true, y_pred) 1.0
-
irmetrics.topk.rr(y_true, y_pred, k=None, relevance=<function multilabel>)¶ Compute Recirocal Rank(s). Calculate the recirocal of the index for the first matched item in
y_pred. The score is between 0 and 1.This ranking metric yields a high value if true labels are ranked high by
y_pred.- Parameters
- y_truescalar, iterable or ndarray of shape (n_samples, n_labels)
True labels of entities to be ranked. In case of scalars
y_predshould be of shape (1, n_labels).- y_prediterable, ndarray of shape (n_samples, n_labels)
Target labels sorted by relevance (as returned by an IR system).
- kint, default=None
Only consider the highest k scores in the ranking. If None, use all outputs.
- relevancecallable, default=topk.relevance.multilabel
A function that calculates relevance judgements based on input
y_predandy_true.
- Returns
- rrfloat in [0., 1.]
The recirocal ranks for all samples.
References
Wikipedia entry for Mean reciprocal rank
Examples
>>> from irmetrics.topk import rr >>> y_true = 1
and the predicted labels by an IR system:
>>> y_pred = [0, 1, 4] >>> rr(y_true, y_pred) 0.5
-
irmetrics.coverage.coverage(y_pred, padding=None)¶ Compute Coverage(s) Check if
y_predcontains any nontrivial results.- Parameters
- y_prediterable, ndarray of shape (n_samples, n_labels)
Target labels sorted by relevance (as returned by an IR system).
- paddingscalar, str, default=None
The value that was used to pad the predictions to get the same length.
- Returns
- coverageint in [0, 1]
- The coverage is 1 if
y_predcontains any results different from paddingand 0 otherwise.
- The coverage is 1 if
Examples
>>> from irmetrics.topk import rr
for gound-truth labels related to some query
>>> y_true = 1
and the predicted labels by an IR system:
>>> y_pred = [0, 1, 4] >>> coverage(y_true) 1 >>> y_pred = [0, None] >>> coverage(y_true) 1 >>> coverage([-1], padding=-1) 0
-
irmetrics.coverage.iou(y_true, y_pred, k=None, relevance=<function multilabel>, n_uniq=<function relevant_counts>)¶ Compute the approximate version of Intersection over Union. The approximation comes in assumption that y_true and y_pred contain only unique values.
- Parameters
- y_truescalar, iterable or ndarray of shape (n_samples, n_labels)
True labels of entities to be ranked. In case of scalars
y_predshould be of shape (1, n_labels).- y_prediterable, ndarray of shape (n_samples, n_labels)
Target labels sorted by relevance (as returned by an IR system).
- kint, default=None
Has no effect provided only for api compatibility.
- relevancecallable, default=topk.relevance.multilabel
A function that calculates relevance judgements based on input
y_predandy_true.- n_uniqcallable, default=topk.relevance.relevant_counts
A function that calculates number of unique labels per query.
- Returns
- ioufloat in [0., 1.]
The ratio of relevant retrieved entries to the union of relevant and retrieved entries.
References
Wikipedia entry for Jaccard Index
Examples
>>> from irmetrics.topk import rr
for ground-truth labels related to a query:
>>> y_true = 1
and the predicted labels by an IR system:
>>> y_pred = [0, 1, 4] >>> iou(y_true, y_pred) 0.3333333333333333
Utilities¶
-
irmetrics.relevance.multilabel(y_true, y_pred)¶ Compute relevance(s) of predicted labels.
- Parameters
- y_truendarray of shape (n_samples, n_true), where n_samples >= 1
Ground true labels for a given query (as returned by an IR system).
- y_predndarray of shape (n_samples, n_labels), where n_samples >= 1
Target labels sorted by relevance (as returned by an IR system). The n_labels and n_true may not be the same.
- Returns
- relevancebolean ndarray
The relevance judgements for y_pred of shape (n_samples, n_labels)
Examples
>>> import numpy as np >>> from irmetrics.relevance import multilabel >>> # ground-truth label of some answers to a query: >>> y_true = np.array([[1]]) # (1, 1)
and the predicted labels by an IR system:
>>> y_pred = np.array([[0, 1, 4]]) # (1, 3) >>> multilabel(y_true, y_pred) array([[False, True, False]]) >>> y_true = np.array([[1], [2]]) # (2, 1) >>> y_pred = np.array([[0, 1, 4], [5, 6, 7]]) # (2, 3) >>> multilabel(y_true, y_pred) array([[False, True, False], [False, False, False]]) >>> # Now the multilabel case: >>> y_true = np.array([[1, 4]]) # (1, 2) >>> y_pred = np.array([[0, 1, 4]]) # (1, 3) >>> multilabel(y_true, y_pred) array([[False, True, True]])
-
irmetrics.relevance.relevant_counts(y_pred, y_true)¶ Calculate the total number of relevant items.
- Parameters
- y_truendarray of shape (n_samples, n_true), where n_samples >= 1
Ground true labels for a given query (as returned by an IR system).
- y_predndarray of shape (n_samples, n_labels), where n_samples >= 1
Target labels sorted by relevance (as returned by an IR system). The n_labels and n_true may not be the same.
- Returns
- relevance_counts: ndarray
The number of true relevance judgements for y_pred.
Examples
>>> import numpy as np >>> from irmetrics.relevance import relevant_counts >>> # ground-truth label of some answers to a query: >>> y_true = np.array([[1]]) # (1, 1)
and the predicted labels by an IR system:
>>> y_pred = np.array([[0, 1, 4]]) # (1, 3) >>> relevant_counts(y_true, y_pred) array([[1]]) >>> y_true = np.array([[1], [2]]) # (2, 1) >>> y_pred = np.array([[0, 1, 4], [5, 6, 7]]) # (2, 3) >>> relevant_counts(y_true, y_pred) array([[1], [1]]) >>> # Now the `relevant_counts` case: >>> y_true = np.array([[1, 4]]) # (1, 2) >>> y_pred = np.array([[0, 1, 4]]) # (1, 3) >>> relevant_counts(y_true, y_pred) array([[1, 1]])
-
irmetrics.relevance.unilabel(y_true, y_pred)¶ Compute relevance(s) of predicted labels. This version of the relevance function works only for the queries (problems) with a single groud truth label.
It is provided mainly for two reasons: there is a slight speedup (order of seconds for the large n_samples) and it adds expresivity if needed.
- Parameters
- y_truendarray of shape (n_samples, 1), where n_samples >= 1
Ground true labels for a given query (as returned by an IR system).
- y_predndarray of shape (n_samples, n_labels), where n_samples >= 1
Target labels sorted by relevance (as returned by an IR system).
- Returns
- relevancebolean ndarray
The relevance judgements for y_pred of shape (n_samples, 1)
- Raises
- ValueError
If y_true has last dimension larger than 1 (multilabel case).
Examples
>>> import numpy as np >>> from irmetrics.relevance import unilabel >>> # ground-truth label of some answers to a query: >>> y_true = np.array([[1]]) # (1, 1)
and the predicted labels by an IR system:
>>> y_pred = np.array([[0, 1, 4]]) # (1, 3) >>> unilabel(y_true, y_pred) array([[False, True, False]]) >>> y_true = np.array([[1], [2]]) # (2, 1) >>> y_pred = np.array([[0, 1, 4], [5, 6, 7]]) # (2, 3) >>> unilabel(y_true, y_pred) array([[False, True, False], [False, False, False]])
-
irmetrics.flat.flat(df, query_col, relevance_col, measure, k=None)¶ Calculate the corresponding measure for the data in flat format, with precalculated relevance judgements:
query_col
relevance_col
weights_col
1 1 1 2 2 2 2
0 1 0 0 1 1 1
1.0 2.0 3.0 4.0 5.0 6.0 7.0
- Parameters
- dfpandas.DataFrame
Dataset in the flat form: each row corresponds to a sample with the given query_id and relevance judgement (higher is better).
- query_colstr
The column that corresponds to query identificator.
- relevance_colstr
The column that corresponds to relevance judgements.
- measurecallable
The desired measure to be calculated (one from irmetrics.topk). Currently, only
topk.ndcgandtopk.rrare supported.- kint, default=None
Only consider the highest k scores in the ranking. If None, use all outputs.
- Returns
- measurespandas.core.series.Series
The values of the corresponding measure calculated per each query.
Examples
>>> import pandas as pd >>> from irmetrics.topk import rr >>> from irmetrics.flat import flat >>> df = pd.DataFrame({"quid": [1, 1, 2, 2], "rel": [1, 0, 0, 1]}) >>> flat(df, query_col="quid", relevance_col="rel", measure=rr) quid 1 1.0 2 0.5 Name: rel, dtype: float64