Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for MongoDB Vector Search #147

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

filipecosta90
Copy link
Contributor

MongoDB vector search should reach the community edition later this year

Following up on Mongo announcements (link) from Mongo's CEO:

So, I'm excited to share that we will be introducing full-text search and vector search in MongoDB Community Edition later this year

We can already prepare this benchmark for supporting MongoDB vector search.

Reasoning behind the configuration file

Currently, MongoDB Atlas Vector Search does not offer a way to configure EF_CONSTRUCT and M during index creation, and the only configurable option that could improve precision is exposed during runtime via the numCandidates config (i.e. EF_RUNTIME).

This design choice simplifies the user experience but limits customization and the quality of the replies for use-cases that require higher precision.
Therefore, and to be as fair as possible, we're following Mongo’s official documentation:

"We recommend that you specify a number higher than the number of documents to return (limit) to increase accuracy although this might impact latency. For example, we recommend a ratio of ten to twenty nearest neighbors for a limit of only one document."

and using a numCandidates config that reaches 20x the expected reply limit (meaning our max EF_RUNTIME is 2000)

Reasoning behind cosine and dotProduct similarities de-normalization

In MongoDB Atlas, for cosine and dotProduct similarities [check link], the normalization of the score is done using the following formula:

score = (1 + cosine/dot_product(v1,v2)) / 2

To ensure we're able to calculate the expected precision we revert the score normalization back before checking the scores.

@filipecosta90 filipecosta90 changed the title [Draft] Initial support for MongoDB Vector Search Support for MongoDB Vector Search Jun 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant