Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release: v2.0.0 #457

Merged
merged 19 commits into from
Aug 29, 2024
Merged

release: v2.0.0 #457

merged 19 commits into from
Aug 29, 2024

Conversation

claytonparnell
Copy link
Contributor

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Ruinong Tian and others added 4 commits June 7, 2024 07:42
Details:
1. Make each major version has its own cuda and base image version
2. Update image key dependencies accordingly
@claytonparnell

This comment was marked as outdated.

@claytonparnell

This comment was marked as outdated.

@TRNWWZ
Copy link
Contributor

TRNWWZ commented Aug 15, 2024

How much is image size increased comparing to 1.10.0 or 1.9.1?

src/config.py Outdated Show resolved Hide resolved
@claytonparnell
Copy link
Contributor Author

claytonparnell commented Aug 16, 2024

Only 3 test failures now

FAILED test/test_dockerfile_based_harness.py::test_dockerfiles_for_gpu[scipy.test.Dockerfile-required_packages4] - assert 1 == 0
FAILED test/test_dockerfile_based_harness.py::test_dockerfiles_for_gpu[pandas.test.Dockerfile-required_packages7] - assert 1 == 0
FAILED test/test_dockerfile_based_harness.py::test_dockerfiles_for_gpu[sm-python-sdk.test.Dockerfile-required_packages8] - assert 1 == 0

@claytonparnell
Copy link
Contributor Author

Python Package Size Report (GPU)

Target Image Version: 2.0.0 | Base Image Version: 1.9.0

Python Packages Total Size Summary

Target Version Total Size Base Version Total Size Size Change (abs) Size Change (%)
3.77GB 3.47GB 301.84MB 8.48

Top-20 Largest Python Packages

Package Version in the Target Image Size
libtorch 2.3.1 481.83MB
cudnn 8.9.7.29 446.57MB
tensorflow-base 2.17.0 368.87MB
libcublas 12.5.3.2 240.05MB
libmagma 2.7.2 229.91MB
libxgboost 2.0.3 163.46MB
mkl 2023.2.0 156.82MB
libcufft 11.2.3.61 148.63MB
libcusparse 12.5.1.3 119.08MB
nccl 2.22.3.1 107.25MB
libnpp 12.3.0.159 94.79MB
libcusolver 11.6.3.83 79.26MB
gcc_impl_linux-64 12.4.0 59.06MB
llvm-openmp 18.1.8 55.77MB
cuda-nvdisasm 12.5.39 47.62MB
catboost 1.2.5 45.07MB
sagemaker-code-editor 1.3.1 41.07MB
pillow 10.4.0 40.16MB
libcurand 10.3.6.82 39.82MB
gds-tools 1.10.0.4 37.78MB

Python Package Size Delta

The total size of newly introduced Python packages is 1.62GB, accounts for ${\color{red}42.94}$% of the total package size.

Package Version in the Target Image Version in the Base Image Size Change (abs) Size Change (%)
libtorch 2.3.1 - 481.83MB -
libcublas 12.5.3.2 - 240.05MB -
libcufft 11.2.3.61 - 148.63MB -
libcusparse 12.5.1.3 - 119.08MB -
libnpp 12.3.0.159 - 94.79MB -
libcusolver 11.6.3.83 - 79.26MB -
libxgboost 2.0.3 1.7.6 64.18MB 64.64
gcc_impl_linux-64 12.4.0 - 59.06MB -
cuda-nvdisasm 12.5.39 - 47.62MB -
libcurand 10.3.6.82 - 39.82MB -
gds-tools 1.10.0.4 - 37.78MB -
libllvm16 16.0.6 - 33.72MB -
ray-core 2.31.0 - 29.93MB -
libmagma 2.7.2 2.7.1 28.69MB 14.26
cuda-nvcc-tools 12.5.82 - 22.71MB -
cuda-nvrtc 12.5.82 - 18.10MB -
libnvjitlink 12.5.82 - 15.92MB -
sysroot_linux-64 2.17 - 14.79MB -
rav1e 0.6.6 - 14.71MB -
gxx_impl_linux-64 12.4.0 - 12.54MB -
cuda-nvvm-tools 12.5.82 - 11.22MB -
libstdcxx-devel_linux-64 12.4.0 - 11.07MB -
cuda-nvcc-dev_linux-64 12.5.82 - 10.98MB -
cuda-sanitizer-api 12.5.81 - 9.28MB -
cuda-nvvm-impl 12.5.82 - 8.63MB -
cuda-cupti-dev 12.5.82 - 7.65MB -
virtualenv 20.21.0 - 6.01MB -
binutils_impl_linux-64 2.40 - 5.96MB -
mlflow-skinny 2.15.1 - 5.47MB -
python 3.11.9 3.10.14 5.12MB 21.03
p11-kit 0.24.1 - 4.48MB -
mlflow-ui 2.15.1 - 4.36MB -
libsanitizer 12.4.0 - 3.76MB -
pandas 2.2.2 2.1.4 3.12MB 26.34
aom 3.9.1 - 2.58MB -
cuda-nvprof 12.5.82 - 2.56MB -
libgcc-devel_linux-64 12.4.0 - 2.44MB -
libnvjpeg 12.3.2.81 - 2.38MB -
ray-default 2.31.0 - 2.31MB -
poppler-data 0.4.12 - 2.24MB -
svt-av1 2.1.2 - 2.24MB -
imagecodecs 2024.6.1 - 1.96MB -
scipy 1.12.0 1.11.4 1.92MB 13.21
sagemaker-code-editor 1.3.1 1.1.0 1.91MB 4.88
cuda-cupti 12.5.82 - 1.83MB -
poppler 24.08.0 - 1.82MB -
gnutls 3.8.7 - 1.79MB -
py-spy 0.3.14 - 1.57MB -
libjxl 0.10.3 - 1.51MB -
nodejs 20.12.2 18.20.2 1.45MB 9.7
libunistring 0.9.10 - 1.37MB -
cuda-cccl_linux-64 12.5.39 - 1.29MB -
cccl 2.4.0 - 1.28MB -
libgrpc 1.62.2 1.54.3 1.25MB 21.88
libparquet 15.0.2 - 1.13MB -
elfutils 0.191 - 1.13MB -
pillow 10.4.0 10.3.0 1.07MB 2.73
libhwy 1.1.0 - 1.06MB -
matplotlib-base 3.9.2 3.8.4 1.06MB 16.17
plotly 5.23.0 5.22.0 1.04MB 20.51
rdma-core 53.0 28.9 998.46KB 27.37
nettle 3.9.1 - 987.93KB -
setuptools 72.1.0 70.1.1 943.04KB 194.32
kernel-headers_linux-64 3.10.0 - 922.21KB -
libcufile 1.10.1.7 - 896.67KB -
tf-keras 2.17.0 - 882.79KB -
libarrow-gandiva 15.0.2 - 879.88KB -
libnvfatbin 12.5.82 - 781.91KB -
hyperopt 0.2.7 - 771.25KB -

Python Package Size Report (CPU)

Target Image Version: 2.0.0 | Base Image Version: 1.9.0

Python Packages Total Size Summary

Target Version Total Size Base Version Total Size Size Change (abs) Size Change (%)
1.19GB 1.27GB -87.18MB -6.69

Top-20 Largest Python Packages

Package Version in the Target Image Size
mkl 2023.2.0 156.82MB
tensorflow-base 2.17.0 146.40MB
llvm-openmp 18.1.8 55.77MB
libtorch 2.3.1 47.54MB
catboost 1.2.5 45.07MB
sagemaker-code-editor 1.3.1 41.07MB
pillow 10.4.0 40.16MB
libllvm16 16.0.6 33.72MB
pytorch 2.3.1 31.89MB
libllvm14 14.0.6 30.03MB
ray-core 2.31.0 29.93MB
python 3.11.9 29.45MB
pandoc 3.3 19.92MB
amazon_sagemaker_sql_editor 0.1.10 17.26MB
scipy 1.12.0 16.47MB
nodejs 20.12.2 16.37MB
pandas 2.2.2 14.96MB
rav1e 0.6.6 14.71MB
statsmodels 0.14.2 11.78MB
icu 75.1 11.57MB

Python Package Size Delta

The total size of newly introduced Python packages is 153.74MB, accounts for ${\color{red}12.65}$% of the total package size.

Package Version in the Target Image Version in the Base Image Size Change (abs) Size Change (%)
libtorch 2.3.1 - 47.54MB -
libllvm16 16.0.6 - 33.72MB -
ray-core 2.31.0 - 29.93MB -
tensorflow-base 2.17.0 2.15.0 9.91MB 7.26
virtualenv 20.21.0 - 6.01MB -
mlflow-skinny 2.15.1 - 5.47MB -
python 3.11.9 3.10.14 5.12MB 21.03
p11-kit 0.24.1 - 4.48MB -
mlflow-ui 2.15.1 - 4.36MB -
pandas 2.2.2 2.1.4 3.12MB 26.34
libxgboost 2.0.3 1.7.6 2.53MB 104.02
ray-default 2.31.0 - 2.31MB -
poppler-data 0.4.12 - 2.24MB -
scipy 1.12.0 1.11.4 1.92MB 13.21
sagemaker-code-editor 1.3.1 1.1.0 1.91MB 4.88
poppler 24.08.0 - 1.82MB -
gnutls 3.8.7 - 1.79MB -
py-spy 0.3.14 - 1.57MB -
nodejs 20.12.2 18.20.2 1.45MB 9.7
libunistring 0.9.10 - 1.37MB -
elfutils 0.191 - 1.13MB -
pillow 10.4.0 10.3.0 1.07MB 2.73
matplotlib-base 3.9.2 3.8.4 1.06MB 16.17
plotly 5.23.0 5.22.0 1.04MB 20.51
nettle 3.9.1 - 987.93KB -
setuptools 72.1.0 70.1.1 943.04KB 194.32
tf-keras 2.17.0 - 882.79KB -
hyperopt 0.2.7 - 771.25KB -
libgoogle-cloud-storage 2.28.0 - 751.27KB -
sqlalchemy 2.0.32 2.0.31 748.74KB 27.45
lightgbm 4.3.0 3.3.5 706.65KB 43.86
libgrpc 1.62.2 1.59.3 699.90KB 10.86

@claytonparnell
Copy link
Contributor Author

Staleness Report: 2.0.0(gpu)

Package Current Version in the Distribution image Latest Relevant Version in Upstream
${\color{red}numpy}$ 1.26.4 2.1.0
jinja2 3.1.4 3.1.4
pytorch 2.3.1 2.3.1
altair 5.4.0 5.4.0
${\color{red}boto3}$ 1.34.131 1.35.0
ipython 8.26.0 8.26.0
jupyter-lsp 2.2.5 2.2.5
${\color{red}jupyterlab}$ 4.1.8 4.2.4
amazon-q-developer-jupyterlab-ext 3.2.0 3.2.0
${\color{red}langchain}$ 0.2.5 0.2.14
${\color{red}jupyter-ai}$ 2.20.0 2.21.0
amazon-sagemaker-jupyter-ai-q-developer 1.0.7 1.0.7
jupyter-scheduler 2.7.1 2.7.1
amazon-sagemaker-jupyter-scheduler 3.1.3 3.1.3
pandas 2.2.2 2.2.2
amazon-sagemaker-sql-magic 0.1.3 0.1.3
${\color{red}jupyterlab-lsp}$ 5.0.3 5.1.0
amazon_sagemaker_sql_editor 0.1.10 0.1.10
${\color{red}scipy}$ 1.12.0 1.14.0
matplotlib-base 3.9.2 3.9.2
${\color{red}scikit-learn}$ 1.4.2 1.5.1
pip 24.2 24.2
torchvision 0.18.1 0.18.1
autogluon 1.1.1 1.1.1
ipywidgets 8.1.3 8.1.3
${\color{red}notebook}$ 7.1.3 7.2.1
aws-glue-sessions 1.0.6 1.0.6
conda 24.7.1 24.7.1
${\color{red}fastapi}$ 0.110.3 0.112.1
jupyter-activity-monitor-extension 0.3.1 0.3.1
jupyter-collaboration 2.1.2 2.1.2
jupyter-dash 0.4.2 0.4.2
jupyter-server-proxy 4.3.0 4.3.0
jupyterlab-git 0.50.1 0.50.1
keras 3.5.0 3.5.0
langchain-aws 0.1.16 0.1.16
mlflow 2.15.1 2.15.1
${\color{red}py-xgboost-gpu}$ 2.0.3 2.1.1
pyhive 0.7.0 0.7.0
python-gssapi 1.8.3 1.8.3
python-lsp-server 1.11.0 1.11.0
sagemaker-code-editor 1.3.1 1.3.1
sagemaker-headless-execution-driver 0.0.13 0.0.13
sagemaker-jupyterlab-emr-extension 0.3.2 0.3.2
sagemaker-jupyterlab-extension 0.3.2 0.3.2
sagemaker-kernel-wrapper 0.0.2 0.0.2
sagemaker-mlflow 0.1.0 0.1.0
sagemaker-python-sdk 2.227.0 2.227.0
sagemaker-studio-analytics-extension 0.1.2 0.1.2
supervisor 4.2.5 4.2.5
tensorflow 2.17.0 2.17.0
tf-keras 2.17.0 2.17.0
uvicorn 0.30.6 0.30.6

Staleness Report: 2.0.0(cpu)

Package Current Version in the Distribution image Latest Relevant Version in Upstream
${\color{red}numpy}$ 1.26.4 2.1.0
jinja2 3.1.4 3.1.4
pytorch 2.3.1 2.3.1
altair 5.4.0 5.4.0
${\color{red}boto3}$ 1.34.131 1.35.0
ipython 8.26.0 8.26.0
jupyter-lsp 2.2.5 2.2.5
${\color{red}jupyterlab}$ 4.1.8 4.2.4
amazon-q-developer-jupyterlab-ext 3.2.0 3.2.0
${\color{red}langchain}$ 0.2.5 0.2.14
${\color{red}jupyter-ai}$ 2.20.0 2.21.0
amazon-sagemaker-jupyter-ai-q-developer 1.0.7 1.0.7
jupyter-scheduler 2.7.1 2.7.1
amazon-sagemaker-jupyter-scheduler 3.1.3 3.1.3
pandas 2.2.2 2.2.2
amazon-sagemaker-sql-magic 0.1.3 0.1.3
${\color{red}jupyterlab-lsp}$ 5.0.3 5.1.0
amazon_sagemaker_sql_editor 0.1.10 0.1.10
${\color{red}scipy}$ 1.12.0 1.14.0
matplotlib-base 3.9.2 3.9.2
${\color{red}scikit-learn}$ 1.4.2 1.5.1
pip 24.2 24.2
torchvision 0.18.1 0.18.1
autogluon 1.1.1 1.1.1
ipywidgets 8.1.3 8.1.3
${\color{red}notebook}$ 7.1.3 7.2.1
aws-glue-sessions 1.0.6 1.0.6
conda 24.7.1 24.7.1
${\color{red}fastapi}$ 0.110.3 0.112.1
jupyter-activity-monitor-extension 0.3.1 0.3.1
jupyter-collaboration 2.1.2 2.1.2
jupyter-dash 0.4.2 0.4.2
jupyter-server-proxy 4.3.0 4.3.0
jupyterlab-git 0.50.1 0.50.1
keras 3.5.0 3.5.0
langchain-aws 0.1.16 0.1.16
mlflow 2.15.1 2.15.1
${\color{red}py-xgboost-cpu}$ 2.0.3 2.1.1
pyhive 0.7.0 0.7.0
python-gssapi 1.8.3 1.8.3
python-lsp-server 1.11.0 1.11.0
sagemaker-code-editor 1.3.1 1.3.1
sagemaker-headless-execution-driver 0.0.13 0.0.13
sagemaker-jupyterlab-emr-extension 0.3.2 0.3.2
sagemaker-jupyterlab-extension 0.3.2 0.3.2
sagemaker-kernel-wrapper 0.0.2 0.0.2
sagemaker-mlflow 0.1.0 0.1.0
sagemaker-python-sdk 2.227.0 2.227.0
sagemaker-studio-analytics-extension 0.1.2 0.1.2
supervisor 4.2.5 4.2.5
tensorflow 2.17.0 2.17.0
tf-keras 2.17.0 2.17.0
uvicorn 0.30.6 0.30.6

@claytonparnell
Copy link
Contributor Author

Staleness analysis

  • numpy - ton of packages not migrated to numpy 2 - tensorflow, langchain, jupyter-ai, mlflow, etc. Community needs time to migrate. Will likely need to catch in next major
  • boto3 - pinned to patch w each aiobotocore release
  • jupyterlab - jupyter-collaboration req's jupyterlab<4.2
  • langchain - autogluon.multimodal requires jsonschema<4.22, newer langchain requires jsonschema>=4.22
  • jupyter-ai - New version released in between image build and stalene
  • jupyterlab-lsp - amazon_sagemaker_sql_editor req's jupyterlab-lsp==5.0.*
  • scipy - autogluon req's scipy<1.13
  • scikit-learn - autogluon req's scikit-learn<1.4.3
  • notebook - due to jupyterlab version, restricted by jupyter-collaboration
  • fastapi - autogluon.multimodal requires jsonschema<4.22, newer fastapi requires jsonschema>=4.22
  • py-xgboost-{cpu/gpu} - autogluon.timeseries req's xgboost<2.1

@claytonparnell claytonparnell merged commit 517eea2 into aws:main Aug 29, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants