Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hex_codes_in_unicode_sequences: restrictions around f-strings are too strict #4523

Open
MeGaGiGaGon opened this issue Dec 4, 2024 · 5 comments
Labels
F: strings Related to our handling of strings T: style What do we want Blackened code to look like?

Comments

@MeGaGiGaGon
Copy link
Contributor

I had this idea while reading #4522. This probably shouldn't block hex_codes_in_unicode_sequences from being stabilized since it is an edge case, but would be nice to have.

playground link Currently, f"{'\xFF'}" does not get formatted. It should be formattable to f"{'\xff'}".

Note that any changes here have to be careful of f-string debug statements. This formatting would not have any effect since \xXX is resolved in the string literal before the formatting output happens

>>> f"{'\xFF'=}"
"'ÿ'='ÿ'"
>>> f"{'\xff'=}"
"'ÿ'='ÿ'"

but if there is ever a situation where the formatting is applied when the escape isn't resolved, then the behavior would change observably.

@MeGaGiGaGon MeGaGiGaGon added the T: style What do we want Blackened code to look like? label Dec 4, 2024
@JelleZijlstra
Copy link
Collaborator

Thanks, good catch. I think this doesn't need to block the stabilization; we can tweak it next year. Though I'd accept a PR fixing it before the end of the year.

@JelleZijlstra JelleZijlstra added the F: strings Related to our handling of strings label Dec 4, 2024
@MeGaGiGaGon
Copy link
Contributor Author

I have found why this happens, but not how to fix it.

The issue was introduced by #4401. The test used for if an f-string can be formatted like a normal string is if any of the f-string {}s contain a \, which is too broad of a test.
You can see this by the fact that this code f"\xFF{"\a"}" also doesn't have the hex value formatted.

Note that this does also accidentally fix a possible bug, since if this f-string f"{r"\xFF"}" was formatted as a normal string to f"{r"\xff"}" it would change program behavior (but be caught by the inequivalent to source code sanity check).

Since fixing this naively both introduces a bug and, in my attempt, breaks a ton of tests, I'm not sure what to do/what approach to take. Thoughts @JelleZijlstra? Also cc @tusharsadhwani

This is also could be considered a byproduct of the actual f-string formatting code being commented out with # TODO: Uncomment Implementation to format f-string children, but I couldn't find any context from #3822 why this is the case.

@JelleZijlstra
Copy link
Collaborator

Thanks for looking into this!

This is also could be considered a byproduct of the actual f-string formatting code being commented out with # TODO: Uncomment Implementation to format f-string children, but I couldn't find any context from #3822 why this is the case.

#3822 was a big change by itself to add parsing support for new f-strings. I didn't want to make the change even more complicated by adding formatting changes, especially because (as I recall) the initial version made some questionable formatting choices. Ideally we should format code inside f-strings, and that might provide a principled way to fix this issue too. However, it's a bigger project.

Since fixing this naively both introduces a bug and, in my attempt, breaks a ton of tests, I'm not sure what to do/what approach to take.

Since this is a fairly obscure bug, the best approach might be to leave this simple bug around for now, and investigate broader changes to format f-strings in a principled way, like I discussed above.

@tusharsadhwani
Copy link
Contributor

It wasn't (and still hasn't afaik) been decided how we want to approach nested string formatting, so #4401 simply restored the pre-3.12 behaviour.

We could for example adopt Ruff's decisions on how to format them. i.e., continue this discussion: astral-sh/ruff#9785

@MeGaGiGaGon
Copy link
Contributor Author

Update from the corresponding Ruff issues I've opened: The current python behavior might be a bug in the parser, in which case the current formatting would be correct. python/cpython#124363

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
F: strings Related to our handling of strings T: style What do we want Blackened code to look like?
Projects
None yet
Development

No branches or pull requests

3 participants