Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ColBERT] idea: return top_n best scoring tokens #488

Open
fdurant opened this issue Jun 13, 2024 · 3 comments
Open

[ColBERT] idea: return top_n best scoring tokens #488

fdurant opened this issue Jun 13, 2024 · 3 comments

Comments

@fdurant
Copy link

fdurant commented Jun 13, 2024

I'm experimenting with RAGStack ColBERT and have a feature request.

In order to be able to produce a query-passage scoring interpretability visualization like this, it would be handy if the result of ColbertVectorStore.add_texts also included the top-n list of most contributing tokens, each with a normalized score that would be trivial to color-code in a UI. This could be achieved via an extra parameter include_token_scores: int = 0

@cbornet
Copy link
Collaborator

cbornet commented Jun 18, 2024

There is no scoring when adding documents. It only happens when retrieving.
Chunk scores are part of ColbertRetriever::text_search. Does it answer your need ?

@cbornet
Copy link
Collaborator

cbornet commented Jun 18, 2024

Also I don't think we keep track of the tokens. Only their embeddings. And the Chunk score is the max of the embeddings scores which are not exposed either,
@zzzming can you confirm ?

@epinzur
Copy link
Collaborator

epinzur commented Jun 19, 2024

@cbornet you are correct. We don't store the tokens for the text... only the embeddings of the tokens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants