Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow translating from javascript files #15612

Open
wants to merge 69 commits into
base: main
Choose a base branch
from

Conversation

cofiem
Copy link
Contributor

@cofiem cofiem commented Mar 17, 2024

This PR enables translating strings in javascript.

It demonstrates the js translation in use by applying it to the timeago.js js util.

Edit: added overview of the two mutually-exclusive approaches.

Approach: webpack plugin the generates js message bundles

This is the current state of this PR.

I attempted an approach similar to https://www.npmjs.com/package/@zainulbr/i18n-webpack-plugin
i.e. embed the translations into separate js bundles, one for each language.

Unfortunately, the existing webpack plugins were not suitable (issues with webpack 5 and/or did not work for the warehouse setup).
However, I did manage to create a webpack plugin that does the job (in webpack.plugin.localize.js).
The plugin checks the plural forms and embeds them into the generated js bundles.

So, the development process with these changes:

  • Use gettext or ngettext as appropriate in js files
  • Run make translations: pybabel extracts messages from js files, then a python script generates messages.json files for only the KNOWN_LOCALES and only includes translations that are needed in js
  • Run make static_pipeline: the webpack plugin embeds the translated data from the messages.json files
  • At runtime, the matching js bundle for the current locale is set in base.html, and gettext.js is initialised with the translation data. It can determine plurals and do string templating.

These are the benefits of this approach:

  • Uses the existing translation and js tooling - make, pybabel, locale files, webpack
  • Enables plurals and template placeholders
  • Limits or does not use the hard to cache routes nor additional http requests
  • Limits the number of unmaintained packages required, and works with webpack 5
  • Straightforward to maintain and use as part of contributions

Approach: request from client to server to get translation for a given message code

A js util fetch-gettext.js that requests the translation code returns a promise, which contains the translation text from the server, using the locale set on the server-side.

view the changes for this approach.

Changes:

  • enables extracting strings from javascript using pybabel,
  • adds a route that renders the appropriate translation using the existing machinery, and
  • adds js util for obtaining the translated string.

Considerations:

  • may make many http requests for translations

Notes

  • Pybabel seems to require particular js function names and the config to change them does not work, similar to this pybabel issue. This is ok, I've used the names expected by pybabel.

Relates to issues:

@di
Copy link
Member

di commented Mar 19, 2024

Very cool! I like this approach a lot, I think this will definitely work.

One thing I'm wondering: since we probably won't have a ton of JS translations, would it make sense to load them all for a given locale in a bundle via a single request, as opposed to making N different requests?

@cofiem
Copy link
Contributor Author

cofiem commented Mar 19, 2024

One thing I'm wondering: since we probably won't have a ton of JS translations, would it make sense to load them all for a given locale in a bundle via a single request, as opposed to making N different requests?

I'm not sure I understand this.

I found a way to use the server-side rendering of translations because:

  • I can't see that there's a way to get just the js translation strings to make a bundle of translation strings
  • each translation can have up to 6 variants, which are chosen based on the num parameter
  • some translations have placeholders in the string, and I didn't want to have to find or make a way to fill the placeholders in js

If you're concerned about the server load or frequency of requests, the browser cache seems to work well, and there's likely some additional caching to be done in the server-side view that could reduce the need to execute the localiser.pluralize or localizer.translate.

Please let me know if I've misunderstood what you're asking.

@cofiem
Copy link
Contributor Author

cofiem commented Mar 19, 2024

I managed to get the plurals extracted.

It looks like python-babel expects particular function names.

TODO:

  • I'll include some screenshots of the js translations in action.
  • I may also have time to go through and make the rest of the text that's been identified in the other issues as translatable.
  • fix the couple of failing tests

Question:

  • Is there an existing way to cache the output of a view that depends on the request.params? That would work well for caching the translation view on the server side.
    Maybe origin_cache?

@di
Copy link
Member

di commented Mar 20, 2024

If you're concerned about the server load or frequency of requests, the browser cache seems to work well, and there's likely some additional caching to be done in the server-side view that could reduce the need to execute the localiser.pluralize or localizer.translate.

Yes, I was thinking about balancing the number of requests with the size of the requests. It's probably an over-optimization though, our CDN will cache these responses by default so it's not the load on our backend that I was concerned with, just the number of requests the user's browser would have to make after the page has loaded. I think it's fine to move forward as-is, just wanted to think through it a bit.

Is there an existing way to cache the output of a view that depends on the request.params? That would work well for caching the translation view on the server side.

By default, we cache all requests, but strip all URL parameters when caching a given view (with some exceptions). We can add an exception for this view. This happens here:

https://github.com/pypi/infra/blob/9568346c6781a6d9e754e9edc3883dd34febd97b/terraform/warehouse/vcl/main.vcl#L103-L126

@cofiem
Copy link
Contributor Author

cofiem commented Mar 24, 2024

The only remaining question is what @di was asking - is the number of requests made by the browser a problem?

Some example screenshots of the js translations in use.

Screenshot 2024-03-24 181338
Screenshot 2024-03-24 181556
Screenshot 2024-03-24 181952
Screenshot 2024-03-24 182303

@cofiem
Copy link
Contributor Author

cofiem commented Mar 24, 2024

Test errors:

> msg += " " + " ".join(results["feedback"]["suggestions"])
E TypeError: unsupported operand type(s) for +=: 'LazyString' and 'str'

This one could be fixed by adding str(msg), but there might be a better way?

- msg += " " + " ".join(results["feedback"]["suggestions"])
+ msg = str(msg) + " " + " ".join(results["feedback"]["suggestions"])

> return request.localizer.translate(_translation_factory(message, **kwargs))
E AttributeError: 'NoneType' object has no attribute 'localizer'

I'm not sure how to fix this error. Other form tests seem to be able to set request=pretend.stub(), but the Validator directly uses from warehouse.i18n import localize as _, which errors at request.localizer.

@cofiem cofiem marked this pull request as ready for review March 24, 2024 10:40
Copy link
Member

@di di left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took one last look and I have one outstanding issue here before we can merge.

bin/translations Outdated Show resolved Hide resolved
cofiem and others added 3 commits August 19, 2024 16:52
This removes a step and allows translation updates (static_pipeline) to generate the js bundles.
The messages.json files are not required when translation data is bundled like this.
Copy link
Member

@miketheman miketheman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good - I did some basic testing and it looks like it creates a full JS bundle per language, so once a local has changed, the only JS resource loaded is the one for that locale.

I put a bunch of comments inline - let me know if they don't make sense.

I'm also a little concerned about a JS injection attack from externally-supplied values - is there something I'm overlooking on how these are sanitized?

Comment on lines 124 to 125
const denied = new RegExp(
"[~`^$_\\[\\]{}\\\\'\"\\.#@\\f\\n\\r\\t\\v\\u00a0\\u1680\\u2000-\\u200a\\u2028\\u2029\\u202f\\u205f\\u3000\\ufeff]+");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: add a comment to denote where this regex originated, and what it is denying

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in 8e2e3f5

compressionOptions: { level: 9, memLevel: 9 },
compressionOptions: {level: 9, memLevel: 9},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: does our linter not check and enforce these kinds of changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(might be?) addressed in 7d1e8cb

Makefile Outdated Show resolved Hide resolved
warehouse/admin/static/js/warehouse.js Outdated Show resolved Hide resolved
Comment on lines 69 to 80
const data = {
"": {"language": "fr", "plural-forms": "nplurals=2; plural=n > 1;"},
"My default message.": "My translated message.",
"My %2 message with placeholders %1.": "My translated %1 message with placeholders %2",
"My message with plurals": ["My translated message 1.", "My translated messages 2."],
"My message with plurals %1 again": ["My translated message 1 %1.", "My translated message 2 %1"],
};
const pluralForms = function (n) {
let nplurals, plural;
nplurals = 2; plural = n > 1;
return {total: nplurals, index: ((nplurals > 1 && plural === true) ? 1 : (plural ? plural : 0))};
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: this looks a lot like a parametrized test, is that the case here? If so, would it make more sense to use jest's .each to express the individual tests to reduce the amount of logic baked in to the test case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in b6fc1e6

import timeAgo from "../../warehouse/static/js/warehouse/utils/timeago";
import {delay} from "./utils";

describe("time ago util", () => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay, we have tests now!

}

export function ngettextCustom(singular, plural, num, extras, data, pluralForms) {
// this function allows for testing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: is this comment still accurate, if ngettext is basically a wrapper for this, with some defaults?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in d32b447

Comment on lines 16 to 20
var messagesAccessLocaleData = {"": {"language": "en", "plural-forms": "nplurals = 2; plural = (n != 1)"}};

// the value for 'messagesAccessPluralFormFunction' is set by webpack.plugin.localize.js
// default is en
var messagesAccessPluralFormFunction = function (n) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: should these be var or const/let? (again, I wonder what's happening with the linter...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in d32b447

return file;
};
// Refs: https://github.com/shellscape/webpack-manifest-plugin/issues/229#issuecomment-737617994
// publicPath: "",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: why is this previously-set value commented out? If it's no longer valid, let's remove it, along with the comment above as to why it's there, or move it to where it's now being set, or extract it to a constant that can be easier commented and reused.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in 0f13ec1

if (refs.every(refLine => !refLine.includes(".js:"))) {
continue;
}
result[value.msgid] = value.msgstr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to sanitize the user input at this point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(partly?) addressed in fd0dd80

webpack.plugin.localize.js Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
webpack.plugin.localize.js Outdated Show resolved Hide resolved
Copy link
Member

@di di left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@di di requested a review from miketheman September 23, 2024 15:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
i18n Internationalization javascript requires change to JavaScript files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants