Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complexity lint against some_string.as_bytes().len() #13434

Closed
LikeLakers2 opened this issue Sep 21, 2024 · 4 comments · Fixed by #13437
Closed

Complexity lint against some_string.as_bytes().len() #13434

LikeLakers2 opened this issue Sep 21, 2024 · 4 comments · Fixed by #13437
Assignees
Labels
A-lint Area: New lints

Comments

@LikeLakers2
Copy link

LikeLakers2 commented Sep 21, 2024

What it does

This lint would recommend changing instances of some_string.as_bytes().len() to some_string.len().

Explanation: When getting the length of a string, str::len() already returns the length in bytes. Converting a string to &[u8] using str::as_bytes(), and then getting the length of that, introduces unnecessary complexity where none is needed.

Advantage

Less complex-looking code, while still functioning the same.

Drawbacks

Applying this lint may make it less obvious that the length will be in bytes - because our intuition for getting a string's length is that it will be the number of characters. Yes, the rust documentation addresses this, but it's still quite easy to forget - especially for us english speakers, who probably only expect to deal with ASCII (which is only ever one byte in length).

Example

let string_len_in_bytes = some_string.as_bytes().len();

Could be written as:

let string_len_in_bytes = some_string.len();
@samueltardieu
Copy link
Contributor

@rustbot claim

@cgwalters
Copy link

Applying this lint may make it less obvious that the length will be in bytes - because our intuition for getting a string's length is that it will be the number of characters. Yes, the rust documentation addresses this, but it's still quite easy to forget - especially for us english speakers, who probably only expect to deal with ASCII (which is only ever one byte in length).

I got bit recently by forgetting that str::len is bytes, and I went to go check if there was a clippy lint basically warning on str::len and was a bit surprised to find that there's one for the opposite.

I understand where folks are coming from on this, but still...I suspect there's two kinds of uses of str::len - those comparing versus other strings which is fine, and those that are probably buggy and should be counting characters.

Would it be possible to have the opposite lint as an opt-in and have it disable this lint?

@y21
Copy link
Member

y21 commented Nov 18, 2024

The category was discussed a fair bit on Zulip and a poll was created to ultimately decide, see the thread: https://rust-lang.zulipchat.com/#narrow/channel/257328-clippy/topic/FCP.3A.20needless_as_bytes


If you want the opposite of this lint, you can disallow the use of str::len in your clippy.toml

disallowed-methods = [
  { path = "str::len", reason = "use <str>.as_bytes().len() instead" }
]

I don't know if it's worth having an opt-in lint for something that disallowed_methods can do by itself (you would need to manually enable it for your project in any case).

@cgwalters
Copy link

Thank you! I was not aware of disallowed-methods. I tried it in one of my projects and for what it's worth: every use of str::len was either a bug or a place where we basically wanted to validate more strictly that only ASCII was in use - it's not that we wanted to use .as_bytes().len() either really. I guess this is where https://lib.rs/crates/ascii could be used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-lint Area: New lints
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants