You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to do single-model batch transform in SageMaker to get predictions from a pre-trained model (I did not train the model on SageMaker). My end goal is to be able to run just a bit of python code to start a batch transform job and grab the results from S3 when it's done.
importboto3client=boto3.client("sagemaker")
client.create_transform_job(...)
# occasionally monitor the jobclient.describe_transform_job(...)
# fetch results once job is finishedclient=boto3.client("s3")
...
I can successfully get the results I need using Transformer.transform() in a SageMaker notebook instance (see the appendix below for code snippets), but in my project I do not want to depend on the SageMaker Python SDK. Instead, I'd rather use boto3 like in the pseudocode above.
The issue
I referenced this example notebook to try and extend a PyTorch inference container (see appendix below for the dockerfile I am using), but I can't get the same results that I can when I use the SageMaker Python SDK in a notebook instance. Instead I get this error:
Backend worker process died.
Traceback (most recent call last):
File "/opt/conda/lib/python3.6/site-packages/ts/model_service_worker.py", line 182, in <module>
worker.run_server()
File "/opt/conda/lib/python3.6/site-packages/ts/model_service_worker.py", line 154, in run_server
self.handle_connection(cl_socket)
File "/opt/conda/lib/python3.6/site-packages/ts/model_service_worker.py", line 116, in handle_connection
service, result, code = self.load_model(msg)
File "/opt/conda/lib/python3.6/site-packages/ts/model_service_worker.py", line 89, in load_model
service = model_loader.load(model_name, model_dir, handler, gpu, batch_size, envelope)
File "/opt/conda/lib/python3.6/site-packages/ts/model_loader.py", line 110, in load
initialize_fn(service.context)
File "/home/model-server/tmp/models/d00cc5c716dc4e4582250bd89915b99b/handler_service.py", line 51, in initialize
super().initialize(context)
File "/opt/conda/lib/python3.6/site-packages/sagemaker_inference/default_handler_service.py", line 66, in initialize
self._service.validate_and_initialize(model_dir=model_dir)
File "/opt/conda/lib/python3.6/site-packages/sagemaker_inference/transformer.py", line 158, in validate_and_initialize
self._model = self._model_fn(model_dir)
File "/opt/conda/lib/python3.6/site-packages/sagemaker_pytorch_serving_container/default_pytorch_inference_handler.py", line 55, in default_model_fn
NotImplementedError:
Please provide a model_fn implementation.
See documentation for model_fn at https://github.com/aws/sagemaker-python-sdk
The problem seems to be that when the inference toolkit tries to import a customized inference.py script, it can't find it, presumably because /opt/ml/model/code is not found in sys.path.
If I understand the code correctly, then in this snippet below (which runs before the snippet above), we are attempting to add the code_dir to sys.path, but this won't affect the current runtime.
importsysfromsagemaker_inference.environmentimportcode_dir
...
# add model_dir/code to python path ifcode_dirnotinsys.path:
sys.path.append(code_dir)
Appendix
Notebook cells containing code I was able to run successfully
Here's what I can get running in a SageMaker notebook instance (ml.p2.xlarge). The last cell takes about 5 minutes to run.
fromsagemakerimportget_execution_rolefromsagemaker.pytorch.modelimportPyTorchModel# fill out proper values herepath_to_model="s3://bucket/path/to/model/model.tar.gz"repo="GITHUB_REPO_URL_HERE"branch="BRANCH_NAME_HERE"token="GITHUB_PAT_HERE"path_to_code_location="s3://bucket/path/to/code/location"github_repo_source_dir="relative/path/to/entry/point"path_to_output="s3://bucket/path/to/output"path_to_input="s3://bucket/path/to/input"
pytorch_model=PyTorchModel(
image_uri="763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-inference:1.4-gpu-py36", # the latest supported version I could get workingmodel_data=path_to_model,
git_config={
"repo": repo,
"branch": branch,
"token": token,
},
code_location=path_to_code_location, # must provide this so that a default bucket isn't createdsource_dir=github_repo_source_dir,
entry_point="inference.py",
role=get_execution_role(),
py_version="py3",
framework_version="1.4", # must provide this even though we are supplying `image_uri`
)
# Tutorial for extending AWS SageMaker PyTorch containers:# https://github.com/aws/amazon-sagemaker-examples/blob/master/advanced_functionality/pytorch_extending_our_containers/pytorch_extending_our_containers.ipynbARG REGION=us-west-2
# SageMaker PyTorch ImageFROM 763104351884.dkr.ecr.$REGION.amazonaws.com/pytorch-inference:1.8.1-gpu-py36-cu111-ubuntu18.04
ARG CODE_DIR=/opt/ml/model/code
ENV PATH="${CODE_DIR}:${PATH}"# /opt/ml and all subdirectories are utilized by SageMaker, we use the /code subdirectory to store our user code.COPY /inference ${CODE_DIR}
# Used by the SageMaker PyTorch container to determine our user code directory.ENV SAGEMAKER_SUBMIT_DIRECTORY ${CODE_DIR}
# Used by the SageMaker PyTorch container to determine our program entry point.# For more information: https://github.com/aws/sagemaker-pytorch-containerENV SAGEMAKER_PROGRAM inference.py
The text was updated successfully, but these errors were encountered:
Background
I am trying to do single-model batch transform in SageMaker to get predictions from a pre-trained model (I did not train the model on SageMaker). My end goal is to be able to run just a bit of python code to start a batch transform job and grab the results from S3 when it's done.
I can successfully get the results I need using
Transformer.transform()
in a SageMaker notebook instance (see the appendix below for code snippets), but in my project I do not want to depend on the SageMaker Python SDK. Instead, I'd rather useboto3
like in the pseudocode above.The issue
I referenced this example notebook to try and extend a PyTorch inference container (see appendix below for the dockerfile I am using), but I can't get the same results that I can when I use the SageMaker Python SDK in a notebook instance. Instead I get this error:
The problem seems to be that when the inference toolkit tries to import a customized
inference.py
script, it can't find it, presumably because/opt/ml/model/code
is not found insys.path
.sagemaker-inference-toolkit/src/sagemaker_inference/transformer.py
Line 169 in cb9e793
If I understand the code correctly, then in this snippet below (which runs before the snippet above), we are attempting to add the
code_dir
tosys.path
, but this won't affect the current runtime.sagemaker-inference-toolkit/src/sagemaker_inference/default_handler_service.py
Lines 59 to 64 in cb9e793
I wonder if it should be like this instead:
Appendix
Notebook cells containing code I was able to run successfully
Here's what I can get running in a SageMaker notebook instance (
ml.p2.xlarge
). The last cell takes about 5 minutes to run.Dockerfile for extended container
The text was updated successfully, but these errors were encountered: