Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release: v1.5.2 #184

Merged
merged 4 commits into from
Mar 1, 2024
Merged

release: v1.5.2 #184

merged 4 commits into from
Mar 1, 2024

Conversation

claytonparnell
Copy link
Contributor

@claytonparnell claytonparnell commented Feb 20, 2024

Issue #, if available:

Description of changes: Releasing 1.5.2.

NOTE - we are doing minor upgrade of notebook version, from 7.0.7->7.1.1
context: Previously, notebook depended on jupyterlab >=4.0.7,<5. Thus, we had jupyterlab 4.1.1 and notebook 7.0.7 in image v5.0.1.
Now, they (notebook maintainer) have gone and patched previously released dependencies (link) so now notebook 7.0.7 needs jupyterlab<4.1.0a0.. We are manually upgrading notebook to 7.1 to fix

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@claytonparnell

This comment was marked as outdated.

@claytonparnell

This comment was marked as outdated.

Copy link
Contributor

@nikhilumeshsargur nikhilumeshsargur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving the PR

@dylanraws
Copy link

For packages that have known test failures, how do we ensure there is no regression?

If I understand correctly, the reason jupyterlab is being downgraded is because notebook is being upgraded. If correct, why don't we just not update notebook yet? A downgrade is more invasive to users than delaying an upgrade.

Why does the staleness report say that the latest upstream jupyterlab version is 4.0.12 if there are 4.1.* versions out?

@claytonparnell
Copy link
Contributor Author

@dylanraws

For packages that have known test failures, how do we ensure there is no regression?

We would look into the failures, and ensure they are the documented failures we know about. This is <1% of the test suite, but needs to be fixed.

If I understand correctly, the reason jupyterlab is being downgraded is because notebook is being upgraded. If correct, why don't we just not update notebook yet? A downgrade is more invasive to users than delaying an upgrade.

I agree downgrade should be avoided when possible, this is one of the tenants of the image- in this case, notebook actually isn't getting upgraded, it's staying the same. But in between this patch and the previous, notebook's owners went and changed the dependencies of the already released version, making our current closure incompatible. So this fix is finding the closure that would have been found previously, if notebook had the correct dependencies from the start.

Why does the staleness report say that the latest upstream jupyterlab version is 4.0.12 if there are 4.1.* versions out?

Relevant is the key word here- for a patch release, the latest relevant upstream version refers to the latest patch for this given minor. This is because during a patch, we can only upgrade the package to a newer patch. We are only concerned if there is a newer version we could upgrade to, so in a minor release the latest relevant would be the latest 4.1.*.

@dylanraws
Copy link

@claytonparnell

We would look into the failures, and ensure they are the documented failures we know about. This is <1% of the test suite, but needs to be fixed.

I don't think the percentage of the test suite that is passing is relevant. We aim to have zero regressions. A test failure is supposed to indicate a regression. If these failure are not indicating regressions, how do we know that, and how do we know that the functionality these tests are supposed to verify is not in fact regressing? Is there an alternative way to verify the functionality of these packages?

So this fix is finding the closure that would have been found previously, if notebook had the correct dependencies from the start.

Understood. Just so we are on the same page, we are knowingly regressing on features. The JL 4.1 features are being removed in this patch release. This may impact users. I'm not sure if there is any process in place for notifying SageMaker customers about this ahead of time. If we expect this kind of problem to happen in the future, we should brainstorm alternative strategies for navigating this. For example, we could communicate with the notebook maintainers to at least better understand why v7 is not compatible with JL 4.1 and try to push for a compatible version to be released (if that communication has already taken place, I don't see it mentioned in this PR), and contribute to notebook to expedite that change.

Relevant is the key word here- for a patch release

Thanks for clarifying. I notice that the staleness report also says the current version is 4.0.12, which is technically wrong. Just curious, did you have to do any manual hacks to get this working? If so, is that process documented? Asking about this in case there is an opportunity to avoid manual effort in the future for similar situations.

dylanraws
dylanraws previously approved these changes Feb 22, 2024
@balajisankar15
Copy link
Contributor

@dylanraws

In my initial PR for staleness report generator, I had three different columns - latest relevant patch/minor/major versions available in conda forge.

I got a feedback that this might confuse folks. Hence we are dynamically displaying a single column now (based on the type of release)

Since this is a patch version release , it will only look for latest relevant patch version in 4.0.x. it will ignore any version that is >=4.1.0 . This is because, even if 4.1 exists , a patch version release of SM distribution shouldn't bump the minor version .

@claytonparnell
Copy link
Contributor Author

============================================================================================== short test summary info ===============================================================================================
FAILED test/test_dockerfile_based_harness.py::test_dockerfiles_for_gpu[scipy.test.Dockerfile-required_packages4] - assert 1 == 0
FAILED test/test_dockerfile_based_harness.py::test_dockerfiles_for_gpu[autogluon.test.Dockerfile-required_packages1] - assert 1 == 0
FAILED test/test_dockerfile_based_harness.py::test_dockerfiles_for_gpu[pandas.test.Dockerfile-required_packages7] - assert 1 == 0
========================================================================== 3 failed, 18 passed, 2 skipped, 4 warnings in 6713.21s (1:51:53) ==========================================================================

scipy/pandas are known failures. I ran autogluon test separately and it passed, it seems the gpu autogluon test can run into some memory issues when running full suite. Might have to drop to one pytest worker, or bump up hardware..

@claytonparnell
Copy link
Contributor Author

Staleness Report: 1.5.2(gpu)

Package Current Version in the Distribution image Latest Relevant Version in Upstream
numpy 1.26.4 1.26.4
jinja2 3.1.3 3.1.3
pandas 2.1.4 2.1.4
altair 5.2.0 5.2.0
${\color{red}boto3}$ 1.28.64 1.28.85
ipython 8.21.0 8.21.0
jupyter-lsp 2.2.3 2.2.3
jupyterlab 4.1.2 4.1.2
amazon-codewhisperer-jupyterlab-ext 2.0.1 2.0.1
jupyter-scheduler 2.5.1 2.5.1
amazon-sagemaker-jupyter-scheduler 3.0.7 3.0.7
scipy 1.11.4 1.11.4
scikit-learn 1.4.1.post1 1.4.1.post1
pip 23.3.2 23.3.2
torchvision 0.15.2 0.15.2
autogluon 0.8.2 0.8.2
ipywidgets 8.1.2 8.1.2
notebook 7.1.1 7.1.1
aws-glue-sessions 1.0.4 1.0.4
conda 23.11.0 23.11.0
fastapi 0.103.2 0.103.2
langchain 0.1.9 0.1.9
jupyter-ai 2.9.1 2.9.1
jupyter-dash 0.4.2 0.4.2
jupyter-server-proxy 4.1.0 4.1.0
jupyterlab-git 0.50.0 0.50.0
jupyterlab-lsp 5.0.3 5.0.3
keras 2.12.0 2.12.0
matplotlib 3.8.3 3.8.3
nodejs 18.18.2 18.18.2
py-xgboost-gpu 1.7.6 1.7.6
thrift_sasl 0.4.3 0.4.3
pyhive 0.7.0 0.7.0
python-gssapi 1.8.3 1.8.3
python-lsp-server 1.10.0 1.10.0
pytorch-gpu 2.0.0 2.0.0
sagemaker-headless-execution-driver 0.0.12 0.0.12
sagemaker-jupyterlab-emr-extension 0.1.9 0.1.9
sagemaker-jupyterlab-extension 0.2.0 0.2.0
sagemaker-kernel-wrapper 0.0.2 0.0.2
sagemaker-python-sdk 2.198.1 2.198.1
sagemaker-studio-analytics-extension 0.0.21 0.0.21
sasl 0.3.1 0.3.1
supervisor 4.2.5 4.2.5
tensorflow 2.12.1 2.12.1
uvicorn 0.27.1 0.27.1

Staleness Report: 1.5.2(cpu)

Package Current Version in the Distribution image Latest Relevant Version in Upstream
numpy 1.26.4 1.26.4
jinja2 3.1.3 3.1.3
pytorch 2.0.0 2.0.0
pandas 2.1.4 2.1.4
altair 5.2.0 5.2.0
${\color{red}boto3}$ 1.28.64 1.28.85
ipython 8.21.0 8.21.0
jupyter-lsp 2.2.3 2.2.3
jupyterlab 4.1.2 4.1.2
amazon-codewhisperer-jupyterlab-ext 2.0.1 2.0.1
jupyter-scheduler 2.5.1 2.5.1
amazon-sagemaker-jupyter-scheduler 3.0.7 3.0.7
scipy 1.11.4 1.11.4
scikit-learn 1.4.1.post1 1.4.1.post1
pip 23.3.2 23.3.2
torchvision 0.15.2 0.15.2
autogluon 0.8.2 0.8.2
ipywidgets 8.1.2 8.1.2
notebook 7.1.1 7.1.1
aws-glue-sessions 1.0.4 1.0.4
conda 23.11.0 23.11.0
fastapi 0.103.2 0.103.2
langchain 0.1.9 0.1.9
jupyter-ai 2.9.1 2.9.1
jupyter-dash 0.4.2 0.4.2
jupyter-server-proxy 4.1.0 4.1.0
jupyterlab-git 0.50.0 0.50.0
jupyterlab-lsp 5.0.3 5.0.3
keras 2.12.0 2.12.0
matplotlib 3.8.3 3.8.3
nodejs 18.18.2 18.18.2
py-xgboost-cpu 1.7.6 1.7.6
thrift_sasl 0.4.3 0.4.3
pyhive 0.7.0 0.7.0
python-gssapi 1.8.3 1.8.3
python-lsp-server 1.10.0 1.10.0
sagemaker-headless-execution-driver 0.0.12 0.0.12
sagemaker-jupyterlab-emr-extension 0.1.9 0.1.9
sagemaker-jupyterlab-extension 0.2.0 0.2.0
sagemaker-kernel-wrapper 0.0.2 0.0.2
sagemaker-python-sdk 2.198.1 2.198.1
sagemaker-studio-analytics-extension 0.0.21 0.0.21
sasl 0.3.1 0.3.1
supervisor 4.2.5 4.2.5
tensorflow 2.12.1 2.12.1
uvicorn 0.27.1 0.27.1

@claytonparnell
Copy link
Contributor Author

Rerunning full test suite, autogluon succeeds

FAILED test/test_dockerfile_based_harness.py::test_dockerfiles_for_gpu[scipy.test.Dockerfile-required_packages4] - assert 1 == 0
FAILED test/test_dockerfile_based_harness.py::test_dockerfiles_for_gpu[pandas.test.Dockerfile-required_packages7] - assert 1 == 0

@claytonparnell
Copy link
Contributor Author

saw a few additional test failures for pandas, but upon investigation it was because 2/29 wasn't recognized as a valid datetime 🐸 pandas-dev/pandas#57672

@claytonparnell claytonparnell merged commit 8fe28c6 into aws:main Mar 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants