[ColBERT] idea: return top_n best scoring tokens #488

fdurant · 2024-06-13T09:37:29Z

I'm experimenting with RAGStack ColBERT and have a feature request.

In order to be able to produce a query-passage scoring interpretability visualization like this, it would be handy if the result of ColbertVectorStore.add_texts also included the top-n list of most contributing tokens, each with a normalized score that would be trivial to color-code in a UI. This could be achieved via an extra parameter include_token_scores: int = 0

The text was updated successfully, but these errors were encountered:

cbornet · 2024-06-18T15:48:12Z

There is no scoring when adding documents. It only happens when retrieving.
Chunk scores are part of ColbertRetriever::text_search. Does it answer your need ?

cbornet · 2024-06-18T16:36:19Z

Also I don't think we keep track of the tokens. Only their embeddings. And the Chunk score is the max of the embeddings scores which are not exposed either,
@zzzming can you confirm ?

epinzur · 2024-06-19T13:43:05Z

@cbornet you are correct. We don't store the tokens for the text... only the embeddings of the tokens.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ColBERT] idea: return top_n best scoring tokens #488

[ColBERT] idea: return top_n best scoring tokens #488

fdurant commented Jun 13, 2024

cbornet commented Jun 18, 2024

cbornet commented Jun 18, 2024

epinzur commented Jun 19, 2024

[ColBERT] idea: return top_n best scoring tokens #488

[ColBERT] idea: return top_n best scoring tokens #488

Comments

fdurant commented Jun 13, 2024

cbornet commented Jun 18, 2024

cbornet commented Jun 18, 2024

epinzur commented Jun 19, 2024