-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix flow-remove-types for input with multi-byte characters #7781
base: main
Are you sure you want to change the base?
Conversation
Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please sign up at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need the corporate CLA signed. If you have received this in error or have any questions, please contact us at [email protected]. Thanks! |
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks! |
@mroch this is ready for review. While the performance didn't regress, there's no point in optimizing the code much because almost all time is spent in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@motiz88 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mourner for catching this and working on a fix. I have a concern regarding source maps and some other minor comments, but this looks good in principle.
packages/flow-remove-types/index.js
Outdated
@@ -397,6 +407,7 @@ function getTrailingLineNode(context, node) { | |||
// Creates a zero-width "node" with a value to splice at that position. | |||
// WARNING: This is only safe to use when source maps are off! | |||
function getSpliceNodeAtPos(context, pos, loc, value) { | |||
context.bytesAdded += value.length; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is technically mixing JS string length (UTF-16 code units) and UTF-8 bytes. Not a problem yet because the only call to getSpliceNodeAtPos
uses a plain ASCII string (' =>'
), but maybe we should future-proof this?
return (result += source.slice(lastPos)); | ||
offset += sourceBuffer.copy(buf, offset, lastPos, sourceBuffer.length); | ||
|
||
return buf.toString('utf8', 0, offset); | ||
}, | ||
generateMap: function() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this issue translate to source maps? I'm suddenly worried that the source map spec is silent on what "columns" mean, but I'd imagine it's meant to be interpreted more like "code points" than "bytes". So we might need to do some Unicode bookkeeping in generateSourceMappings
as well.
@@ -1,6 +1,8 @@ | |||
/* @flow */ | |||
// @nolint | |||
|
|||
// multi-byte chars: Гарного дня, котики! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add some non-BMP (multi UTF-16 code unit) characters to the test? e.g. emoji: '🐈'.length == 2
, Buffer.from('🐈').length == 4
Also, I'd add a test case with non-ASCII characters in the actual code that we process, e.g.
var lambda: λ = (α: number): number => α;
@mourner have you still got time to look into this and address the comments? |
@thehogfather apologies — can't get to it at the moment but hopefully I'll find some time in a week or two. The only comment left unaddressed is this one:
If anyone's up to helping out on the source maps part, I'd appreciate it! |
Closes #7779 by implementing the approach described by @mroch in #7779 (comment).
Remaining work:
Try eliminating all string manipulation andNot worth it — bottleneck is in parsing.buffer.write
calls in favor of purely byte-based operations for performance.