Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Bulk Download from GlotPress #401

Draft
wants to merge 3 commits into
base: trunk
Choose a base branch
from

Conversation

AliSoftware
Copy link
Contributor

@AliSoftware AliSoftware commented Aug 16, 2022

Warning: This is a Work in Progress branch. I've only commited this to the remote and opened this draft PR to upload and keep a backup of my experiment while I refine the implementation, but this is still early stage of the implementation

🚧 Next Steps / TODO

  • Fix rubocop violations
  • Implement robust Unit Tests for all the new classes and stages
  • Implement the fastlane actions that will use those new implementations
  • Undraft this PR

Why?

With us encountering the 429 Too Many Requests quota limitation more and more often while downloading translations from GlotPress during our release duties, I've recently asked our Team Global and Orbit (ref: pxLjZ-72D-p2) to install the gp-import-export plugin on our GlotPress instances (ref: 307-gh-Automattic/Orbit), so that we can bulk-download the translations for all the locales at once (as a ZIP).

This will allow us to only make a single network request to export them instead of one request per locale (which ends up being a lot of requests especially for WordPress and the 40+ locales it supports)

How?

I've reimplemented the logic for the download of GlotPress exports into a more modern infrastructure / code architecture in our code base. I intend to deprecate glotpress_helper and gp_downloadmetadata_action once this new implementation is validated as more stable, and refactoring android_download_translations_action / ios_downlaod_strings_files_from_glotpress to use the new helpers and models (or deprecating those actions and creating new ones to replace them).

The new implementation/architecture is structured like this:

Models to manipulate locales easily

This will supersede my #296 draft PR (which I opened… one year ago!)

  • models/locale: is a Struct which represents a single locale, exposing the various locale code representations for that locale — since there are multiple ISO standards for locale codes needed/used in various places of the codebase (ios folder names, android folder names, glotpress locale codes…)
  • models/locales: represents a set of Locale objects. That model allows to build a set of locales based on their locale codes (to have an object representing all the locales supported by a project/app, typically), build convenience sets (e.g. Mag16), easily add/remove/filter locales from sets, and find a known locale from the set of all known locales

A class dedicated to download exports from GlotPress

The helper/glotpress_downloader class knows how to exports from GlotPress in android, ios or json formats.

  • You start by initializing an instance with a GlotPress URL host and URL project path

  • You can then call methods to download exports from that project:

    • Either for a single, given locale, one by one
    • Or — provided the GlotPress host has the gp-import-export plugin installed and supports it — as a bulk download of all the locales at once (as a ZIP stream that will be decompressed by the downloader class on the fly)
  • That class also knows how to parse the odd format used by GlotPress for its JSON exports (where the JSON keys are actually the copy key + \0004 + the original copy, and the JSON values are an array with mostly only one entry at most) and convert it to a usable Hash.

This class is supposed to only have the knowledge of the GlotPress side of things. Once it downloaded the export from GlotPress in memory (File/IO instance) and parsed it, it is not responsible for doing anything with the data except handing it over to the next piece in that new architecture (typically a *_file_writer).
That way, if we want to support alternative Localization backends in the future (e.g. OneSky or Weblate, see 298-gh-Automattic/i18n-issues), this would be nicely decoupled.

File writers — to write the downloaded translations for each locale to disk

  • helper/ios/ios_strings_file_writer knows how to take a single .strings file export, for a single locale, and write it to disk(†).
  • helper/android/android_strings_file_writer knows how to take a single strings.xml file export, for a single locale, and write it to disk(†).
  • helper/fastlane_metadata_file_writer knows how to take a Hash of exported translations (key => translation) for a single locale, and write it as .txt files to disk (typically in fastlane/metadata/…/{locale}/ folders)
    • This class includes a subclass MetadataRule which describes the rules to apply to each key, especially its max_len and the name/path of the corresponding .txt file to associate with it.
    • The FastlaneMetadataFileWrite then knows how to use a list of MetadataRule objects to extract the proper keys from the translations, check their length, try the shorter alternatives (entries for #{original_key}_short, by convention), and write the one that fits the limit into the proper .txt file (or delete the file if no fitting translation was found).
    • The class provides predefined sets of MetadataRule lists for Android and iOS projects, with predefined keys for typical App Store / Play Store metadata.
    • The writer still offers a way to dynamically provide a MetadataRule for any unknown key, allowing us to describe how to handle unexpected/unusual keys (e.g. screenshots) as we encounter them.

(†) Those two *_strings_file_writer helpers are currently very simple, just taking the IO stream and writing it to disk verbatim, with the only logic being to know the relative path to use for the subfolder based on the locale passed.
But in the future I'm hoping we can improve those classes to optimize the way files are written to disk, especially reorder the XML nodes in strings.xml files to make sure they are always sorted in the same order, making PR diffs for them easier to review. This is not implemented in this PR yet though.

To Test

🚧 There is currently no unit tests / specs. This is the next step on my TODO list.
In the meantime, I've written some "demo code" of how those helpers are supposed to interact with each other and assembled together.
This demo code is likely the scaffold of what the future new fastlane actions (the ones using this new implementaion) will look like, but for now they are quite drafty and only serve the purpose of brain-dumping my idea of how that new API of mine would work together at call site.

@AliSoftware AliSoftware self-assigned this Aug 16, 2022
@AliSoftware
Copy link
Contributor Author

cc @mokagio in case you're curious (squirrel!) about this early implementation and my vision/attempt for the new code modular architecture for all those GlotPress-export-related code (see PR description) 😃


To be clear, this is still WIP and not fully ready for review yet (I'll add you as reviewer once I got unit tests and it's ready); though the new architecture and its separation of concerns used for the new classes (separating the download part, the post-process part, and the writing part, to be ready to potentially support a platform other than GlotPress one day) is already where I want it to be.

But since I'm not sure when I will have time to come back on this code/project in the near future (hence why I decided to push it as draft anyway, to avoid losing my work so far) to finish it, I figured that, knowing you, you might be interested/curious to take a pick at my brain dump and architecture idea here 😉

]
end

# The common standardized set of Metadata rules for an Android project
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# The common standardized set of Metadata rules for an Android project
# The common standardized set of Metadata rules for an iOS project

# Note: If the translation for `key` exceeds the specified `max_len`, we will try to find an alternate key named `#{key}_short` by convention.
# @param [String] filename The (relative) path to the `.txt` file to write that metadata to
#
MetadataRule = Struct.new(:key, :max_len, :filename) do
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: add keyword_init: true for the Struct so we can update all the MetadataRule.new call sites to use named parameters?

# Visit each key/value pair of a translations Hash, and yield keys and matching translations from it based on the passed `MetadataRules`,
# trying any potential fallback key if the translation exceeds the max limit, and yielding each found and valid entry to the caller.
#
# @param [#read] io
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# @param [#read] io
# @param [Hash<String,String>] translations The hash of key => translations for a single locale

next if rule.nil?

if rule.max_len != nil && value.length > rule.max_len
UI.warning "Translation for #{key} is too long (#{value.length}), trying shorter alternate #{key}."
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
UI.warning "Translation for #{key} is too long (#{value.length}), trying shorter alternate #{key}."
UI.warning "Translation for #{key} is too long (#{value.length} > #{rule.max_len}), trying shorter alternate #{key}."

if rule.max_len != nil && value.length > rule.max_len
UI.warning "Translation for #{key} is too long (#{value.length}), trying shorter alternate #{key}."
short_key = "#{key}_short"
value = json[short_key]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
value = json[short_key]
value = translations[short_key]

query_params = filters.transform_keys { |k| "filters[#{k}]" }.merge('export-format': format)
uri = URI::HTTPS.build(host: host, path: File.join('/', 'exporter', project, '-do'), query: URI.encode_www_form(query_params))
UI.message "Downloading #{uri}"
zip_stream = uri.open(REQUEST_HEADERS)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrap this in begin…rescue…end block

zip_file.each do |entry|
next if entry.name.end_with?('/') && entry.size.zero?

prefix = File.dirname(entry.name).gsub(/[0-9-]*$/, '') + '-'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
prefix = File.dirname(entry.name).gsub(/[0-9-]*$/, '') + '-'
# Each entry in the ZIP looks like `apps-android-release-notes-current-2022-11-08-1951/apps-android-release-notes-current-zh-cn`
# So to get the locale, we use the `dirname`, minus the `2022-11-08-1951` timestamp at the end, as a prefix to strip from the `basename` to only have `zh-cn` left.
prefix = File.dirname(entry.name).gsub(/[0-9-]*$/, '') + '-'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant