Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update internal db storage #1306

Merged
merged 2 commits into from
Aug 15, 2024
Merged

Update internal db storage #1306

merged 2 commits into from
Aug 15, 2024

Conversation

kremnik
Copy link
Contributor

@kremnik kremnik commented Aug 13, 2024

What has been done

With this PR, the performance of the in and update operations has been improved by replacing list with set in the recognition module.

How to test

make lint && make test

@@ -194,8 +192,8 @@ def find(
)

# append replaced images into both old and new images. these will be dropped and re-added.
new_images = new_images + replaced_images
old_images = old_images + replaced_images
new_images.update(replaced_images)
Copy link
Owner

@serengil serengil Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suppose that refresh_database = False, new_images and old_images are not initialized anymore in line 163 and 164. Line 195 and 196 will throw expception in that case.

IMO initializing new_images and old_images as empty sets in line 163, 164 will sort the problem

Additionally, we should have an unit test for refresh_database = False

@serengil
Copy link
Owner

serengil commented Aug 14, 2024

Secondly, do we have any stats about the improvement? The question is that will it be worth?

TLDR:

Why use a set instead of a list?

Time Complexity:

  • list: Membership checks in a list have a time complexity of O(n), where n is the length of the list. This is because the list must potentially scan through all of its elements to determine if the item is present.

  • set: Membership checks in a set have a time complexity of O(1) on average. This is because sets are implemented as hash tables, which allow for much faster lookups.

@kremnik
Copy link
Contributor Author

kremnik commented Aug 14, 2024

@serengil There are three reasons:

  1. Converting set to list is time consumption, but not neccessary operation;
  2. There are a few places when membership checking occurs: here and here. If old_images includes a huge amount of images (n), as well as representations includes another huge amount of images (m), total time complexity will be O(nm) if old_images is a list. Instead, if old_images is a set, total time complexity will be O(m);
  3. It is guaranteed that there is no re-processing of the same images.

@serengil
Copy link
Owner

LGTM

@serengil serengil merged commit f3d1809 into serengil:master Aug 15, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants