Possible class or Enum for SageMaker Job #4935

sjcahill-fcc · 2024-11-20T17:47:56Z

Describe the feature you'd like

When working with SageMaker we are often defining sources and destinations for data and artifacts within our jobs.

For instance a ProcessingInput for a processing job will be defined like:

ProcessingInput(
                        source='s3://path/to/my/input-data.csv',
                        destination='/opt/ml/processing/input'
)

and an output would be defined like:

ProcessingOutput(source='/opt/ml/processing/output/train', destination='s3://...')

And the /opt/ml/... filepaths determine where resources exist in the container and need to be correctly handled in our processing/training code.

There are other locations similar to this for training and tuning and there are environment variables that can control the default locations where resources are expected to be inside the local container.

To keep consistency across our SageMaker projects we usually end up defining a basic class or an Enum in a config file. This helps avoid things like typos and allows users to keep consistent conventions between projects.

Something like a class or Enum that define the most commonly used locations could be helpful for new users to SageMaker and prevent users from having to reference documentation (which can sometimes be a little scattered) to remember the conventional locations.

For example:

class SageMakerProcessingChannels:
    PROCESSING_INPUT_CHANNEL = "/opt/ml/processing/input"
    PROCESSING_OUTPUT_CHANNEL = "/opt/ml/processing/output"
    PROCESSING_TRAIN_OUTPUT_CHANNEL = "/opt/ml/processing/output/train"
    PROCESSING_VALIDATION_OUTPUT_CHANNEL = "/opt/ml/processing/output/validation"
    PROCESSING_TEST_OUTPUT_CHANNEL = "/opt/ml/processing/output/test"
    PROCESSING_TEMP = "/opt/ml/processing/temp"

How would this feature be used? Please describe.
This feature would help standardize some of these common locations and provide IDE code-completion support for common
parameters when working in SageMaker.

Now our processing inputs and outputs would be:

inputs = [ProcessingInput(
                        source='s3://path/to/my/input-data.csv',
                        destination=SageMakerProcessingChannels.PROCESSING_INPUT_CHANNEL
)]
outputs = [
ProcessingOutput(source=SageMakerProcesingChannels.PROCESSING_TRAIN_OUTPUT_CHANNEL, destination='s3://...')
]

Describe alternatives you've considered
We currently use a config that does this and use a cookie cutter template to initialize the SageMaker datascience projects to help promote uniformity across teams.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible class or Enum for SageMaker Job #4935

Possible class or Enum for SageMaker Job #4935

sjcahill-fcc commented Nov 20, 2024

Possible class or Enum for SageMaker Job #4935

Possible class or Enum for SageMaker Job #4935

Comments

sjcahill-fcc commented Nov 20, 2024