Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

size of sqlite database #66

Open
rokroskar opened this issue Jun 19, 2020 · 3 comments
Open

size of sqlite database #66

rokroskar opened this issue Jun 19, 2020 · 3 comments

Comments

@rokroskar
Copy link

Hi, thanks for this very useful rdflib plugin! I am running some tests and comparisons and am noticing that using sqlite results in very large db sizes. I have a graph of ~10k triples and it serializes on disk to ~2MB using rdf-xml and a sqlite db of almost 14MB - is this expected? Or is there some setup step I'm missing that would make the db more reasonable? Thanks!

@mwatts15
Copy link
Collaborator

Hard to say for sure without your source data why your database file is as large as it is. I did a test of adding exactly 10,000 triples with 10,000 distinct subjects, 100 distinct predicates, and 1000 distinct objects and got a DB file size of 4.7MB. Repeated with all distinct sub, pred, obj and that only increased to 4.8 MB. I'm able to increase that to 19MB+ just by using longer URIs.
sqlite3 version: 3.32.2
Python version: 3.7.4
rdflib-sqlalchemy version: 0.4.0

@rokroskar
Copy link
Author

Thanks for the quick response @mwatts15 - there certainly might be some long(ish) URIs in my data. I'm wondering if there are any indexing options available to mitigate this problem?

@mwatts15
Copy link
Collaborator

If by options you mean a flag you can specify that will create a table mapping strings to more compact identifiers, there is no such thing in rdflib-sqlalchemy, nor, as far as I have seen, is there any sqlite extension that does something similar. If you would like to implement such a feature, I would certainly be open to merging it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants