Skip to content

Commit

Permalink
Set up pre-commit hooks and workflows to check code style (#313)
Browse files Browse the repository at this point in the history
  • Loading branch information
aws-tianquaw authored Mar 28, 2024
2 parents 0ca170c + e575ba5 commit 58e9920
Show file tree
Hide file tree
Showing 44 changed files with 948 additions and 840 deletions.
2 changes: 2 additions & 0 deletions .git-blame-ignore-revs
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Migrate code style to Black
a74466c154d01201388f5232b167538275aa74c4
2 changes: 1 addition & 1 deletion .github/workflows/PR_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
This pull request was created by GitHub Actions/AWS CodeBuild! Before merging, please do the following:
- [ ] Review changelog/staleness report.
- [ ] Review build/test results by clicking *Build Logs* in CI Report (be patient, tests take ~4hr).
- [ ] Review ECR Scan results.
- [ ] Review ECR Scan results.
50 changes: 50 additions & 0 deletions .github/workflows/check_code_quality.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
name: check_code_quality

on:
push:
branches: [ main ]
paths:
- "src/**.py"
- "test/**.py"
- "template/**.py"

pull_request:
branches: [ main ]
paths:
- "src/**.py"
- "test/**.py"
- "template/**.py"

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true

defaults:
run:
shell: bash -l {0}

jobs:
build:
name: Check Code Quality
runs-on: ubuntu-latest
if: github.repository == 'aws/sagemaker-distribution'
permissions:
pull-requests: write
contents: write
steps:
- uses: actions/checkout@v4
- uses: mamba-org/setup-micromamba@v1
with:
environment-file: ./environment.yml
environment-name: sagemaker-distribution
init-shell: bash
- name: Free up disk space
run: rm -rf /opt/hostedtoolcache
- name: Activate sagemaker-distribution
run: micromamba activate sagemaker-distribution
- name: Check style with black
run: black --line-length=120 --check src test template
- name: Check style with autoflake
run: autoflake --in-place --expand-star-imports --ignore-init-module-imports --remove-all-unused-imports -rc src test template
- name: Check style with isort
run: isort --profile black -c src test template
2 changes: 1 addition & 1 deletion .github/workflows/monthly-minor-release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,4 +41,4 @@ jobs:
uses: aws/sagemaker-distribution/.github/workflows/build-image.yml@main
with:
release-type: "minor"
base-version: ${{ matrix.version }}
base-version: ${{ matrix.version }}
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
__pycache__
.idea
.DS_Store
.DS_Store
26 changes: 26 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
default_language_version:
# force all unspecified python hooks to run python3
python: python3
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v2.3.0
hooks:
- id: end-of-file-fixer
- id: trailing-whitespace
- id: detect-aws-credentials
- repo: https://github.com/humitos/mirrors-autoflake.git
rev: v1.3
hooks:
- id: autoflake
args: ['--in-place', '--expand-star-imports', '--ignore-init-module-imports', '--remove-all-unused-imports']
- repo: https://github.com/psf/black
rev: 23.3.0
hooks:
- id: black
args: [--line-length=120]
- repo: https://github.com/pycqa/isort
rev: 5.12.0
hooks: # imports sorting
- id: isort
name: isort (python)
args: ["--profile", "black"]
46 changes: 23 additions & 23 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,60 +41,60 @@ GitHub provides additional document on [forking a repository](https://help.githu


## For adding new Conda packages to SageMaker Distribution
SageMaker Distribution will add new Conda packages only during a minor/major version release.
SageMaker Distribution will add new Conda packages only during a minor/major version release.
New packages will not be added during a patch version release.

Follow these steps for sending out a pull request for adding new packages:
1. Identify the latest version of SageMaker Distribution.
2. Create the next minor/major version's build artifacts folder here: https://github.com/aws/sagemaker-distribution/tree/main/build_artifacts
3. Currently, SageMaker Distribution is using Conda forge channel as our source (for Conda
packages).
3. Currently, SageMaker Distribution is using Conda forge channel as our source (for Conda
packages).
Ensure that the new package which you are trying to add is present in Conda forge channel. https://conda-forge.org/feedstock-outputs/
4. Create {cpu/gpu}.additional_packages_env.in file in that folder containing the new packages.
4. Create {cpu/gpu}.additional_packages_env.in file in that folder containing the new packages.
Specify the new package based on the following examples:

i. conda-forge::new-package

ii. conda-forge::new-package[version='>=some-version-number,<some-version-number']
5. Run the following commands to verify whether the new package which you are trying to add is
5. Run the following commands to verify whether the new package which you are trying to add is
compatible with the existing packages in SageMaker Distribution
```
This project uses Conda to manage its dependencies. Run the following to setup your local environment:
conda env update --file environment.yml -n sagemaker-distribution
conda activate sagemaker-distribution
export BASE_PATCH_VERSION='current.latest.version'
# NEXT_VERSION refers to the version number corresponding to the folder you created as part
# NEXT_VERSION refers to the version number corresponding to the folder you created as part
of Step 2.
export NEXT_VERSION='specify.next.version'
# If NEXT_VERSION is a new minor version:
python src/main.py create-minor-version-artifacts --base-patch-version=$BASE_PATCH_VERSION --force
# Or for a new major version:
python src/main.py create-major-version-artifacts --base-patch-version=$BASE_PATCH_VERSION --force
# Build the image:
python src/main.py build \
--target-patch-version=$NEXT_VERSION --skip-tests
```
6. Ensure that the build command succeeds. If it fails, then it means that the package isn't
compatible with the existing packages in SageMaker Distribution. Create a Github Issue, so
6. Ensure that the build command succeeds. If it fails, then it means that the package isn't
compatible with the existing packages in SageMaker Distribution. Create a Github Issue, so
that we can look more into it.
7. Add the relevant tests in https://github.com/aws/sagemaker-distribution/blob/main/test/test_dockerfile_based_harness.py
7. Add the relevant tests in https://github.com/aws/sagemaker-distribution/blob/main/test/test_dockerfile_based_harness.py
and run the build command once again without `--skip-tests` flag.
```
# When writing or debugging tests, you can use standard pytest commands and arguments (https://docs.pytest.org/en/8.0.x/how-to/usage.html) to run specific tests and change test execution behavior. Some useful commands:
# When writing or debugging tests, you can use standard pytest commands and arguments (https://docs.pytest.org/en/8.0.x/how-to/usage.html) to run specific tests and change test execution behavior. Some useful commands:
# The sagemaker-distribution conda env set up earlier should be activated before running below commands
# Runs only tests for cpu image, verbose, shows reason for skipped tests
python -m pytest -n auto -m cpu -vv -rs --local-image-version $VERSION
Expand All @@ -104,9 +104,9 @@ Follow these steps for sending out a pull request for adding new packages:
8. Submit the PR containing the following files.
* {cpu/gpu}.additional_packages_env.in files
* All the test files and test_dockerfile_based_harness.py changes
Note: you don't have to include other files such as env.in/ env.out/ Dockerfile etc in your PR
Also Note: We might ask you to include the test results as part of the PR.
## Finding contributions to work on
Expand Down
16 changes: 16 additions & 0 deletions DEVELOPMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,22 @@ Run the following to invoke those tests:
pytest --local-image-id REPLACE_ME_WITH_IMAGE_ID
```

## Code Style

Install pre-commit to run code style checks before each commit:

```shell
pre-commit install
```

To run formatters for all existing files, use:

```shell
pre-commit run --all-files
```

pre-commit checks can be disabled for a particular commit with git commit -n.

You can also pass a `--use-gpu` flag if the test machine has Nvidia GPU(s) and necessary Nvidia drivers.

### Unit tests for the project's source code
Expand Down
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ in the relevant _RELEASE.md_ file in the [build_artifacts](build_artifacts) dire

### Versioning strategy

Amazon SageMaker Distribution supports semantic versioning as described on [semver.org](https://semver.org/). A major
Amazon SageMaker Distribution supports semantic versioning as described on [semver.org](https://semver.org/). A major
version upgrade of Amazon SageMaker Distribution allows major version upgrades of all its dependencies, and similarly
for minor and patch version upgrades. However, it is important to note that Amazon SageMaker Distribution’s ability to
follow semver guidelines is currently dependent on how its dependencies adhere to them.
Expand All @@ -47,7 +47,7 @@ will remain the same over time.

### Package Staleness Report

If you want to generate/view the staleness report for each of the individual packages in a given
If you want to generate/view the staleness report for each of the individual packages in a given
SageMaker distribution image version, then run the following command:

```
Expand Down Expand Up @@ -98,7 +98,7 @@ directory (such as Jupyter Lab notebooks) will persist.

### Amazon SageMaker Studio

> [Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio.html) is a web-based, integrated
> [Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio.html) is a web-based, integrated
> development environment (IDE) for machine learning that lets you build, train, debug, deploy, and monitor your
> machine learning models.
Expand All @@ -125,8 +125,8 @@ RUN micromamba install sagemaker-inference --freeze-installed --yes --channel co

## FIPS

As of sagemaker-distribution: v0.12+, v1.6+, and v2+, the images come with FIPS 140-2 validated openssl provider
available for use. You can enable the FIPS provider by running:
As of sagemaker-distribution: v0.12+, v1.6+, and v2+, the images come with FIPS 140-2 validated openssl provider
available for use. You can enable the FIPS provider by running:

`export OPENSSL_CONF=/opt/conda/ssl/openssl-fips.cnf`

Expand Down
2 changes: 1 addition & 1 deletion RELEASE.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ python src/main.py build \

Note:

- As you can see above, the `--target-ecr-repo` parameter can be supplied zero or multiple times. If not supplied, the
- As you can see above, the `--target-ecr-repo` parameter can be supplied zero or multiple times. If not supplied, the
tool will just build a local image. If supplied multiple times, it'll upload the images to all those ECR repositories.
- There is also a `--skip-tests` flag which, by default, is `false`. You can supply it if you'd like to skip tests
locally. However, we'll make sure the tests succeed before any image is release publicly.
4 changes: 4 additions & 0 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,7 @@ dependencies:
- pytest-mock
- pytest-xdist
- semver
- autoflake
- black
- isort
- pre-commit
78 changes: 38 additions & 40 deletions src/changelog_generator.py
Original file line number Diff line number Diff line change
@@ -1,67 +1,65 @@
import os
from utils import (
get_dir_for_version,
get_semver,
get_match_specs,
)

from semver import Version

from utils import get_dir_for_version, get_match_specs, get_semver


def _derive_changeset(target_version_dir, source_version_dir, image_config) -> (dict[str,
list[str]], dict[str, str]):
env_in_file_name = image_config['build_args']['ENV_IN_FILENAME']
env_out_file_name = image_config['env_out_filename']
required_packages_from_target = get_match_specs(
target_version_dir + "/" + env_in_file_name
).keys()
def _derive_changeset(target_version_dir, source_version_dir, image_config) -> (dict[str, list[str]], dict[str, str]):
env_in_file_name = image_config["build_args"]["ENV_IN_FILENAME"]
env_out_file_name = image_config["env_out_filename"]
required_packages_from_target = get_match_specs(target_version_dir + "/" + env_in_file_name).keys()
target_match_spec_out = get_match_specs(target_version_dir + "/" + env_out_file_name)
source_match_spec_out = get_match_specs(source_version_dir + "/" + env_out_file_name)

# Note: required_packages_from_source is not currently used.
# In the future, If we remove any packages from env.in, at that time required_packages_from_source will be needed.
# We only care about the packages which are present in the target version env.in file
installed_packages_from_target = {k: str(v.get('version')).removeprefix('==')
for k, v in target_match_spec_out.items()
if k in required_packages_from_target}
installed_packages_from_target = {
k: str(v.get("version")).removeprefix("==")
for k, v in target_match_spec_out.items()
if k in required_packages_from_target
}
# Note: A required package in the target version might not be a required package in the source version
# But source version could still have this package pulled as a dependency of a dependency.
installed_packages_from_source = {k: str(v.get('version')).removeprefix('==') for
k, v in source_match_spec_out.items()
if k in required_packages_from_target}
upgrades = {k: [installed_packages_from_source[k], v] for k, v in installed_packages_from_target.items()
if k in installed_packages_from_source and installed_packages_from_source[k] != v}
new_packages = {k: v for k, v in installed_packages_from_target.items()
if k not in installed_packages_from_source}
installed_packages_from_source = {
k: str(v.get("version")).removeprefix("==")
for k, v in source_match_spec_out.items()
if k in required_packages_from_target
}
upgrades = {
k: [installed_packages_from_source[k], v]
for k, v in installed_packages_from_target.items()
if k in installed_packages_from_source and installed_packages_from_source[k] != v
}
new_packages = {k: v for k, v in installed_packages_from_target.items() if k not in installed_packages_from_source}
# TODO: Add support for removed packages.
return upgrades, new_packages


def generate_change_log(target_version: Version, image_config):
target_version_dir = get_dir_for_version(target_version)
source_version_txt_file_path = f'{target_version_dir}/source-version.txt'
source_version_txt_file_path = f"{target_version_dir}/source-version.txt"
if not os.path.exists(source_version_txt_file_path):
print('[WARN]: Generating CHANGELOG is skipped because \'source-version.txt\' isn\'t '
'found.')
print("[WARN]: Generating CHANGELOG is skipped because 'source-version.txt' isn't " "found.")
return
with open(source_version_txt_file_path, 'r') as f:
with open(source_version_txt_file_path, "r") as f:
source_patch_version = f.readline()
source_version = get_semver(source_patch_version)
source_version_dir = get_dir_for_version(source_version)
image_type = image_config['image_type']
upgrades, new_packages = _derive_changeset(target_version_dir, source_version_dir,
image_config)
with open(f'{target_version_dir}/CHANGELOG-{image_type}.md', 'w') as f:
f.write('# Change log: ' + str(target_version) + '(' + image_type + ')\n\n')
image_type = image_config["image_type"]
upgrades, new_packages = _derive_changeset(target_version_dir, source_version_dir, image_config)
with open(f"{target_version_dir}/CHANGELOG-{image_type}.md", "w") as f:
f.write("# Change log: " + str(target_version) + "(" + image_type + ")\n\n")
if len(upgrades) != 0:
f.write('## Upgrades: \n\n')
f.write('Package | Previous Version | Current Version\n')
f.write('---|---|---\n')
f.write("## Upgrades: \n\n")
f.write("Package | Previous Version | Current Version\n")
f.write("---|---|---\n")
for package in upgrades:
f.write(package + '|' + upgrades[package][0] + '|'
+ upgrades[package][1] + '\n')
f.write(package + "|" + upgrades[package][0] + "|" + upgrades[package][1] + "\n")
if len(new_packages) != 0:
f.write('\n## What\'s new: \n\n')
f.write('Package | Version \n')
f.write('---|---\n')
f.write("\n## What's new: \n\n")
f.write("Package | Version \n")
f.write("---|---\n")
for package in new_packages:
f.write(package + '|' + new_packages[package] + '\n')
f.write(package + "|" + new_packages[package] + "\n")
Loading

0 comments on commit 58e9920

Please sign in to comment.