DRAFT: moved link storage to a pure metadata-based method #650

epinzur · 2024-09-23T12:10:56Z

This makes CassandraGraphStore compatible with existing cassio.

Note:

only traversal_search() has been updated to use the new link storage.

bjchambers · 2024-09-23T17:08:22Z

libs/knowledge-store/ragstack_knowledge_store/graph_store.py

@@ -482,39 +476,33 @@ def fetch_initial_candidates() -> None:
                # If the next nodes would not exceed the depth limit, find the
                # adjacent nodes.
                #
-                # TODO: For a big performance win, we should track which tags we've
+                # TODO: For a big performance win, we should track which links we've


It seems like this may be a stale comment. Specifically, the difference_update on line 487 seems to be doing this.

bjchambers · 2024-09-23T17:13:53Z

libs/knowledge-store/ragstack_knowledge_store/graph_store.py

+            else:
+                # don't add link search to original metadata dict
+                metadata = metadata.copy()
+                metadata[_metadata_s_link_key(link=outgoing_link)] = _metadata_s_link_value()


How does this work? It looks like it is doing equality. But, there may be multiple outgoing links, and we need to find the nodes with one of those as an incoming link. It seems like this is maybe going the wrong direction, and also likely missing the set-equality. Are we sure this works the same?

when we add_nodes(), we are storing all the incoming_links for each chunk as dictionary keys in the metadata_s column. There is an arbitrary, static value set with each key, so that it can be stored in the MAP<text,text> type.

Here, when we search, we are finding all the chunks that have a matching outgoing-link key.

We are essentially doing a hybrid query on metadata keys... the values do not matter.

The code does a separate query for each outgoing link. This the same way the code in main operates.

Ah. That's the part I missed. Could you add a comment to that effect where we store / query? It also seems like there is then a risk that the generated key collides with the user defined key? Is there a prefix or something we should tell people not to use in their metadata?

moved link storage to metadata

738692c

epinzur added the DO NOT MERGE label Sep 23, 2024

epinzur added 3 commits September 23, 2024 16:48

updated mmr traversal

524ba22

minor tweaks

db98996

more tweaks

1f0c232

bjchambers reviewed Sep 23, 2024

View reviewed changes

more updates

228f5e6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DRAFT: moved link storage to a pure metadata-based method #650

DRAFT: moved link storage to a pure metadata-based method #650

epinzur commented Sep 23, 2024

bjchambers Sep 23, 2024

bjchambers Sep 23, 2024

epinzur Sep 24, 2024 •

edited

Loading

bjchambers Sep 24, 2024

DRAFT: moved link storage to a pure metadata-based method #650

Are you sure you want to change the base?

DRAFT: moved link storage to a pure metadata-based method #650

Conversation

epinzur commented Sep 23, 2024

bjchambers Sep 23, 2024

Choose a reason for hiding this comment

bjchambers Sep 23, 2024

Choose a reason for hiding this comment

epinzur Sep 24, 2024 • edited Loading

Choose a reason for hiding this comment

bjchambers Sep 24, 2024

Choose a reason for hiding this comment

epinzur Sep 24, 2024 •

edited

Loading