Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRAFT: moved link storage to a pure metadata-based method #650

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

epinzur
Copy link
Collaborator

@epinzur epinzur commented Sep 23, 2024

This makes CassandraGraphStore compatible with existing cassio.

Note:

  • only traversal_search() has been updated to use the new link storage.

@@ -482,39 +476,33 @@ def fetch_initial_candidates() -> None:
# If the next nodes would not exceed the depth limit, find the
# adjacent nodes.
#
# TODO: For a big performance win, we should track which tags we've
# TODO: For a big performance win, we should track which links we've
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this may be a stale comment. Specifically, the difference_update on line 487 seems to be doing this.

else:
# don't add link search to original metadata dict
metadata = metadata.copy()
metadata[_metadata_s_link_key(link=outgoing_link)] = _metadata_s_link_value()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this work? It looks like it is doing equality. But, there may be multiple outgoing links, and we need to find the nodes with one of those as an incoming link. It seems like this is maybe going the wrong direction, and also likely missing the set-equality. Are we sure this works the same?

Copy link
Collaborator Author

@epinzur epinzur Sep 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when we add_nodes(), we are storing all the incoming_links for each chunk as dictionary keys in the metadata_s column. There is an arbitrary, static value set with each key, so that it can be stored in the MAP<text,text> type.

Here, when we search, we are finding all the chunks that have a matching outgoing-link key.

We are essentially doing a hybrid query on metadata keys... the values do not matter.

The code does a separate query for each outgoing link. This the same way the code in main operates.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. That's the part I missed. Could you add a comment to that effect where we store / query? It also seems like there is then a risk that the generated key collides with the user defined key? Is there a prefix or something we should tell people not to use in their metadata?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants