-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incremental Synchronization Issue with Bandersnatch #1663
Comments
You're in luck. bandersnatch does not delete unless you run a We do not have a feature to only take new packages created/added on PyPI to day. But I am not sure you mean this. I would take a PR to do so, but I don't know the cleanest way. I guess pull down the fill mirror list via the XMLRPC call we do and save all the package names and use that as your start point. Then from there compare to the original list and make that an allow list maybe? This would need to be some sort of filter plugin to be accepted. |
Thank you very much. I only want to mirror all packages from the pypi.org. My target is to build a comprehensive dataset of python registry for research. |
@lxyeternal and @cooperlees, Another approach may be ..to consider is using a local SQLite database to track package metadata. During each sync, compare PyPI's current metadata with the database to identify new or updated packages. Download only those packages and update the database without deleting any local packages. This method simplifies incremental synchronization and ensures no historical data is lost ... Let me know what you both think |
I'd need more information here on implmentation and the goals with this being off by default as most use cases would not benefit from this addition. Also, how would you detect bad data from failed runs (crashes) etc. and be able to re-sync the SQLite Database if this did happen? This opens up a new data store to keep clean and up to date. State is hard. |
I appreciate the feedback and acknowledge the valid concerns raised regarding the implementation and goals for incremental synchronization with Bandersnatch. I must clarify that I misspoke earlier regarding dirsync upon reviewing its documentation .. it appears it may not be suitable for our needs. |
I am currently using
bandersnatch
for mirroring PyPI and have encountered an issue regarding incremental synchronization. I want to set up mybandersnatch
mirror to only sync new packages added topypi.org
. For packages that have been removed from pypi.org, do not delete these packages from the local mirror during synchronization. In short, only perform incremental backups without deleting any packages.how to configure
bandersnatch.conf
to achieve this?The text was updated successfully, but these errors were encountered: