-
Notifications
You must be signed in to change notification settings - Fork 409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade Website #2653
base: main
Are you sure you want to change the base?
Upgrade Website #2653
Conversation
I moved the old website directory from `website/` to `website-old/` and ran `npx @docusaurus/migrate migrate website-old website` Various things are still broken, fixes coming in next commits
The migration script completely ignored our previous landing page. Here we migrate it ourselves manually and drop most of our custom styling in favor of the new docusaurus styling.
These are not yet configured for this fork
I noticed this hidden docusaurus folder is new after the upgrade
`yarn upgrade @docusaurus/core@latest ││ @docusaurus/preset-classic@latest` And then manually upgrading other dependencies. Docusaurus v3 by default interprets all markdown files as mdx. Our md docs are not compatible with mdx yet and require some hand editing to fix. I don t think we're planning to use mdx in our manually handwritten md docs so I've added an option to the docusaurus config to treat .md files as pure markdown.
This controls the theme for codeblocks. The specified themes are closer to the colors used in the old website
The old `<!--DOCUSAURUS_CODE_TABS-->` comments are deprecated, use the new `<Tabs>` component and convert this .md file to .mdx to utilize it.
Remove unused tags and components, improve styles
Based on old footer
Created this project under my personal algolia account specifically for the version of this site that will be hosted under my personal website.
Docusaurus has moved from mathjax to katex. We also make some small tweaks to some of the math in our md to match updated synatx for katex. I spot checked each of our md docs pages to find offenders.
to match the CLI of docusaurus v3. See https://docusaurus.io/docs/cli
Upgrade Docusaurus to v3
This was heavily modified as a result of the docusaurus upgrade
Dramatically simplify the process of deploying and versioning. We remove the shell script and perform the fewer steps from within github actions. A big change here is the move to using docusaurus's built-in versioning system. Before, we were abusing docusaurus's system to take a snapshot of the entire website. For now we are only handling our md docs, but in future commits we will handle versioning tutorials and sphinx.
This is necessary to host from my personal website.
[Actions] Update publish website workflow
The Ax and Botorch repos have this same script but it's named slightly differently T_T
This reverts commit 45dc6f3.
Provides a consistent structure such that notebooks are always nested within a directory regardless of whether the notebook has additional data/assets.
This is a direct copy/paste from the Ax fork. https://github.com/CristianLara/Ax/blob/15242657d3fe19f5655566c881c4156a0b64591e/scripts/convert_ipynb_to_mdx.py It doesn't fully work with the tutorials in this repo yet, fixes in the next commits
The automated migration script left this behind
The automated migration script didn't update this reference
KaTeX requires these to be wrapped by $$ to be parsed as math. We also update our regex for unescaping braces in math to support math wrapped by double dollar signs and containing newlines.
<br> tags are normal in ipynb but fail mdx parsing.
- \begin{}...\end{} blocks don't necessarily have to be {align} blocks. Update the regex to handle any kind. But make sure that the end and begin blocks are of the same type to handle nested blocks. - Make sure $$ symbols are not escaped and include line breaks before and after. I encountered a few examples in our notebooks that didn't match this formatting expected by mdx.
This page is not autogenerated so it's the only file in the docs/tutorials/ directory that we track in github. I manually converted this to MDX from our old page which was a react component because files in the docs/ directory must be either md or mdx.
This variable was used for the reusable_website.yml workflow but is no longer needed.
in pip install command
To test on local fork
The tutorial structure was modified to have each one in a subdirectory. This provides a more consistent file structure when there are tutorials that contain supporting data such as plots.
These were created using the deploy on release workflow in github to test the versioned deploy functionality.
sphinx-rtd-theme v3.0.0 supports sphinx v8
This comment was marked as resolved.
This comment was marked as resolved.
@CristianLara has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2653 +/- ##
=======================================
Coverage 99.98% 99.98%
=======================================
Files 200 200
Lines 18155 18155
=======================================
Hits 18153 18153
Misses 2 2 ☔ View full report in Codecov by Sentry. |
This comment was marked as outdated.
This comment was marked as outdated.
@CristianLara has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
|
||
git add --all | ||
git commit -m "Create version ${{ inputs.new_version }} of site in Docusaurus" | ||
git push --force origin HEAD:main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this bot pushing to? main branch? If so, that'll get blocked
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct it pushes to the main branch... and you're also right that the GH bot respects branch protection rules. Thanks for pointing this out, I didn't have branch protections in my fork.
There's a discussion about this with some workarounds here: https://github.com/orgs/community/discussions/25305#discussioncomment-10728028
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to be pushing to the main branch to publish the website? Previously, we would've simply pushed the updated website to gh-pages branch, which would automatically trigger the deploy action there. Would changing this to push to gh-pages work here (gh-pages is not protected)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The blessed Docusaurus way of versioning is to make versioned copies of the docs directly in the main branch. Running yarn build
then builds all the versions together such that all versions are properly aware of each other.
Previously we were handling our own versioning by always building a single version of the website (two actually if latest and vX.Y.Z for the different path handling), merging with the previously built versions in gh-pages, and performing quite a bit of surgery so that previous versions can point to the new one.
I was initially concerned that storing all versions in the main branch would result in a lot of bloat and disk space, but I don't think this will be an issue because:
- We aren't storing duplicate ipynb files. We're essentially converting them to markdown files which are very small. We still provide download links to the ipynb files by linking directly to the tagged/release version stored in github history.
- Going forward we will no longer create new versions of the website for patch releases, only major and minor releases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some concrete numbers: in commit f13759a I have two test versions of the botorch website and each copy of the versioned docs is around 5MB
(venv) ➜ website git:(f13759a0b) du -hs versioned_docs/* git:(f13759a0b|⚑3)
4.9M versioned_docs/version-v0.13.0
4.9M versioned_docs/version-v0.14.0
(venv) ➜ website git:(f13759a0b) du -hs versioned_sidebars/* git:(f13759a0b|⚑3)
8.0K versioned_sidebars/version-v0.13.0-sidebars.json
8.0K versioned_sidebars/version-v0.14.0-sidebars.json
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you saying that merging a PR to main in OSS without importing to fbcode is either not possible or will cause problems?
Yes, I believe this will cause issues with ShipIt since the repos won't be in sync. Unless there is a way to exclude some of the files/directories in the GitHub repo, but I'm not sure that's possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I've read through the ShipIt documentation and understand the issue now.
Something we can do is have a branch docusaurus-versions
that is only for keeping track of docusaurus versions. It would be a fork of main
but include autogenerated docusaurus versioned files. The behavior would be:
- We always build the website from the
docusaurus-versions
branch - Before building the website we always sync
docusaurus-versions
withmain
- For new releases our github workflow will create the new docusaurus version in the
docusaurus-versions
branch
The main
branch would then only be responsible for tracking the latest documentation, and our github workflows can continue to automate the release process from the docusaurus-versions
branch without touching main
.
I'll make a PR in my fork to test this out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've implemented my proposal above. There is now a release branch docusaurus-versions
in my fork which is used by our workflows to deploy the website and keep track of docusaurus versions. If this approach is accepted I'll create this branch for this repo before merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's an example of the updated workflow running to deploy a test version in my Ax fork using the release branch: https://github.com/CristianLara/Ax/actions/runs/12397040263/job/34606475835
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the bot to commit to a separate docusaurus-versions
branch seems reasonable to me.
Re the file size issues: I think I mixed together parts of the old setup with the new setup in my head. If we're only adding a commit for the new website versions but not for the nightly builds, this shouldn't be an issue.
git config --global user.name "github-actions[bot]" | ||
git config --global user.email "41898282+github-actions[bot]@users.noreply.github.com" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this bot require access to some secrets? How does it get permissions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This uses the GITHUB_TOKEN
provided directly by the github actions workflow: https://docs.github.com/en/actions/security-for-github-actions/security-guides/automatic-token-authentication#about-the-github_token-secret
The granted permissions are configurable with the defaults listed here: https://docs.github.com/en/actions/security-for-github-actions/security-guides/automatic-token-authentication#permissions-for-the-github_token
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on the solution I choose for #2653 (comment) I may use a different approach for auth though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the GH actions bot in this way for creating a commit is an example provided by the official actions/checkout
documentation: https://github.com/actions/checkout?tab=readme-ov-file#push-a-commit-to-a-pr-using-the-built-in-token
uses: JamesIves/github-pages-deploy-action@releases/v4 | ||
with: | ||
branch: gh-pages # The branch the action should deploy to. | ||
folder: website/build # The folder the action should deploy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is a third party action necessary for deployment? Is this to avoid directly pushing to the gh-pages branch (though we seem to be pushing somewhere above)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it's not necessary. There's actually a more official one that I can switch to: https://github.com/actions/deploy-pages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same action that's triggered when we push to gh-pages branch, right? If so, that'd be my preferred solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes exactly, when we push to the gh-pages branch github will build an artifact from that repo and deploy it using the https://github.com/actions/deploy-pages action
Here's a recent gh-pages deploy in this repo that shows this action being used: https://github.com/pytorch/botorch/actions/runs/12371594841/job/34528079206
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Directly deploying the build artifact (instead of committing it to gh-pages
branch) is also what the OSS team recommended after reviewing our workflow.
I'll update this to to use https://github.com/actions/upload-artifact + https://github.com/actions/deploy-pages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented in 99196eb. See the PR for test plan details: CristianLara#10.
tl;dr We will now deploy to github pages directly via the official github actions by directly deploying the build output. The gh-pages
branch is no longer used at all and can be deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to modify my fork's GH Pages settings to build from github actions instead of build from branch, will need to make the same settings change in this main repo after merging.
).resolve() | ||
print(f"{path.stem}") | ||
mdx = transform_notebook(path, metadata) | ||
print("") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we still need 1000 lines of custom conversion code for the notebooks :( We can't use a third party tool like nbconvert to avoid the need for maintaining our own scripts for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docusaurus only natively accepts MD and MDX for docs. There's a hacky way to force html into the page and that's what we were previously doing but it's much less flexible and doesn't get picked up by docusaurus's sidebar or versioning system.
There is an MD output option in nbconvert but we would lose the interactive plots and html/react features of MDX. It might be possible to shave down some of this script by pre-processing using nbconvert but I think the loss of ipynb metadata might actually make some parts more difficult to parse/convert.
This file is mostly a copy of a script written by the BeanMachine team so there may be pieces of it that are not relevant to our tutorials and can trim. Here's the original: https://github.com/facebookresearch/beanmachine/blob/main/website/scripts/convert_ipynb_to_mdx.py
And here's the PR where I originally added support for it in the Ax repo first with the most context: CristianLara/Ax#13
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is definitely the most complex part of the new website and I also would prefer to avoid it if there is a better alternative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is mostly a copy of a script written by the BeanMachine team so there may be pieces of it that are not relevant to our tutorials and can trim
Yeah at the very least we'd want to cut this down to only the parts that are essential to our setup.
...ls/GIBBON_for_efficient_batch_entropy_search/GIBBON_for_efficient_batch_entropy_search.ipynb
Outdated
Show resolved
Hide resolved
Co-authored-by: Sait Cakmak <[email protected]>
This was introduced for testing purposed. We can still manually trigger the workflows from the github UI
This is the preferred method of deploying a website to gh-pages since it won't blow up the webiste build git history. These are also the exact actions and steps that github itself uses when we push to the gh-pages branch.
[Actions] Deploy website using artifact instead of commit
This is not used in our repositories. Remove it to simplify the script.
The botorch tutorials are currently configured to output plots as images but our conversion script supports plotly. Let's enable plotly support to keep the conversion consistent between the botorch and ax repos since they use the same script. We can update the tutorials to output plotly in the future.
For a variety of reasons we cannot have the github actions bot commit changes directly to the `main` branch, but we still want the process of creating new docusaurus versions to be automated in our release process. The approach here is to have a release branch called `docusaurus-versions` which will contain all of the docusaurus data for versioning. The behavior is: 1. We always build the website from the docusaurus-versions branch 2. Before building the website we always sync docusaurus-versions with main 3. For new releases our github workflow will create the new docusaurus version in the docusaurus-versions branch The main branch would then only be responsible for tracking the latest documentation, and our github workflows can continue to automate the release process from the docusaurus-versions branch without touching main.
Docusaurus versions will be tracked in the release branch `docusaurus-versions`
- name: Set up Python | ||
uses: actions/setup-python@v5 | ||
with: | ||
python-version: "3.10" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why only 3.10?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to pick one version to build the website with. Do you prefer a different version, like 3.12?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, might as well use a newer version, o/w we'd have to update it soon anyway.
python: "3.10" | ||
jobs: | ||
post_install: | ||
# Install latest botorch if not on a released version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we set ALLOW_LATEST_GPYTORCH_LINOP = true
here as well?
fetch-depth: 0 | ||
fetch-tags: true | ||
ref: ${{ github.sha }} | ||
- name: Check if major or minor version changed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When would that not be the case? This workflow is executed upon creating a GitHub release, so this should involve updating at least the minor version? Or is the idea that there may be some post-release update bugfix or the like?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also the patch versions. v.. The idea is to not release website for the patch versions (primarily for Ax, since we'll follow semver there after 1.0), since these are not intended to include any new features, but only minimal bug / documentation fixes.
Motivation
We are already using most of the correct tooling but it has become very out of date and is tied together with scripts that are difficult to maintain. Upgrade our tooling by updating to latest versions and dramatically simplifying our setup/deploy scripts.
Includes:
archive
branchThe commits for most of the major changes are grouped into themed PRs on my fork if that makes the review easier: https://github.com/CristianLara/botorch/pulls?q=is%3Apr+is%3Aclosed
Considerations
Test Plan