Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added new check B910 to suggest using Counter() instead of defaultdict(int) #489

Merged

Conversation

dejvidq
Copy link
Contributor

@dejvidq dejvidq commented Aug 15, 2024

Implementing #323.

I'm not sure if the number B910 is OK, but it's just the next available number (not counting B950).

README.rst Outdated
@@ -273,6 +273,8 @@ on the first line and urls or paths that are on their own line::
"https://some-super-long-domain-name.com/with/some/very/long/paths"
)

**B910**: Use Counter() instead of defaultdict(int) to avoid memory leaks. Using the latter can result in using as much as ~1000 MB of memory even for small and simple dicts. Refactoring the code to use the former can reduce memory usage from mentioned ~1000 MB to flat ~20 MB.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this number come from? I would expect it depends on the size of the data; i.e., you can construct a Counter that is a 1000 MB and also a defaultdict that is 20 MB.

It appears the difference is that when you access a key that does not exist, Counter simply returns 0, but defaultdict sets the value to 0 in the dict as well. How much that affects your memory usage obviously depends on how often you access keys that don't exist. We should mention the pattern that leads to the difference, not the exact number, because the exact number is going to depend on your use case.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the numbers were based on the test case I provided in #323, and yes they will depend on the number of missing keys accessed.

How about: "Use Counter() instead of defaultdict(int) to avoid excessive memory use as the default dict will record missing keys with the default value when accessed."

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that sounds good.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 - Thanks all for discussing!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I agree that it wasn't the best decision to mention any numbers there. I changed that.

bugbear.py Outdated
@@ -2380,6 +2391,9 @@ def visit_Lambda(self, node) -> None:
"B909 editing a loop's mutable iterable often leads to unexpected results/bugs"
)
)
B910 = Error(
message="B910 use Counter() instead of defaultdict(int) to avoid memory leaks"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd change "memory leaks" to "memory usage", although that is perhaps a matter of perspective.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Fixed.

@@ -273,6 +273,8 @@ on the first line and urls or paths that are on their own line::
"https://some-super-long-domain-name.com/with/some/very/long/paths"
)

**B910**: Use Counter() instead of defaultdict(int) to avoid excessive memory use as the default dict will record missing keys with the default value when accessed."

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like an unwanted trailing double quote?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes 😕
Fixed

@dejvidq dejvidq force-pushed the suggest_counter_instead_of_defaultdict_int branch from 3caa1b9 to a5db163 Compare August 17, 2024 20:05
Copy link
Collaborator

@cooperlees cooperlees left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for eventually implementing it with tests. Appreciate it.

@cooperlees cooperlees merged commit 5939910 into PyCQA:main Aug 20, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants