[`feat`] Integrate NanoBeIR datasets #2966

ArthurCamara · 2024-09-27T14:54:05Z

As discussed in #2848 (comment), This PR adds a new Evaluator based on the NanoBEIR collection of datasets.

It creates one InformationRetrievalEvaluator for each dataset, and aggregates the results accordingly.

Example:

from sentence_transformers import SentenceTransformer
from sentence_transformers.evaluation import NanoBEIREvaluator

# Load a model
model = SentenceTransformer('all-mpnet-base-v2')

datasets = ["QuoraRetrieval", "MSMARCO"]
query_prompts = {
"QuoraRetrieval": "Instruct: Given a question, retrieve questions that are semantically equivalent to the given question\nQuery: ",
"MSMARCO": "Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery: "
}

evaluator = NanoBEIREvaluator(
dataset_names=datasets,
name="NanoBEIR",
query_prompts=query_prompts,
)

results = evaluator(model)
'''
NanoBEeIR Evaluation of the model on ['QuoraRetrieval', 'MSMARCO'] dataset:
Evaluating NanoBeIRNanoQuoraRetrieval
Evaluating NanoBeIRNanoMSMARCO

Average Queries: 50.0
Average Corpus: 5044.5

Aggregated for Score Function: cosine
Accuracy@1: 39.00%
Accuracy@3: 57.00%
Accuracy@5: 66.00%
Accuracy@10: 77.00%
Precision@1: 39.00%
Recall@1: 34.03%
Precision@3: 20.67%
Recall@3: 54.07%
Precision@5: 15.00%
Recall@5: 64.27%
Precision@10: 8.90%
Recall@10: 75.97%
MRR@10: 0.5004
NDCG@10: 0.5513
Aggregated for Score Function: dot
Accuracy@1: 39.00%
Accuracy@3: 57.00%
Accuracy@5: 66.00%
Accuracy@10: 77.00%
Precision@1: 39.00%
Recall@1: 34.03%
Precision@3: 20.67%
Recall@3: 54.07%
Precision@5: 15.00%
Recall@5: 64.27%
Precision@10: 8.90%
Recall@10: 75.97%
MRR@10: 0.5004
NDCG@10: 0.5513
'''
logger.info(evaluator.primary_metric)
# => "cosine_ndcg@10"
logger.info(results["mean"][evaluator.primary_metric])
# => 0.5512516989358924

(Note that this depends on #2951)

…-padded.

ArthurCamara and others added 15 commits September 23, 2024 07:55

Added the possibility of masking the prompts if the tokenizer is left…

7dc7990

…-padded.

Simplify code

8d7b88b

Remove unrelated changes

c92e334

Add separate query and corpus prompts for IREvaluator

6419121

Add query and corpus prompt_name

c0ae3f6

Merge branch 'UKPLab:master' into Integrate-NanoBEIR-datasets

84063e8

Added NanoBEIREvaluator

f27c918

Rename, example and better logging

e35d454

Fix for all datasets

fec088e

Merge branch 'UKPLab:master' into Integrate-NanoBEIR-datasets

4869ea5

Remove unrelated changes

4a82531

Remove unrelated changes

8944de0

Remove unrelated changes

c018084

Remove unrelated changes

657d1a5

Remove wrong function call to InformationRetrievalEvaluator

8460cfc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[`feat`] Integrate NanoBeIR datasets #2966

[`feat`] Integrate NanoBeIR datasets #2966

ArthurCamara commented Sep 27, 2024

[feat] Integrate NanoBeIR datasets #2966

Are you sure you want to change the base?

[feat] Integrate NanoBeIR datasets #2966

Conversation

ArthurCamara commented Sep 27, 2024

[`feat`] Integrate NanoBeIR datasets #2966

[`feat`] Integrate NanoBeIR datasets #2966