You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This notebook uses inf1, but I had to use inf2, so had to make package updates, etc, and used the following versions in the HuggingFaceModel object:
huggingface_model = HuggingFaceModel(
model_data=s3_model_uri, # path to your model and script
role=role,
transformers_version="4.36", # transformers version used
pytorch_version="1.13.1", # pytorch version used
py_version='py310', # python version used
image_uri = ecr_image,
)
Error
I get an error as following in the logs:
error 1:
Warning: Model was compiled with a newer version of torch-neuron than the current runtime (function operator())
for this, I tried and running the error on older and newer versions, but still got this.
error 2:
WorkerLifeCycle - RuntimeError: The PyTorch Neuron Runtime could not be initialized. Neuron Driver issues are logged
error 3:
WorkerLifeCycle - 2024-May-08 14:02:23.699765 64:64 ERROR NRT:nrt_allocate_neuron_cores NeuronCore(s) not available - Requested:1 Available:0
i got the error above even through i am setting the os.env variable for nrt_allocate_neuron_cores before importing torch_neuron in the inference.py file.
However, interestingly, I am able to get inference from the model sometimes as follows:
[{'label': 'POSITIVE', 'score': 0.9998840093612671}]
Error (sometimes gives the predictions, sometimes it does not):
ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
"code": 400,
"type": "InternalServerException",
DLC image/dockerfile:
Current behavior:
Throws an error (explained above in detail)
Expected behavior:
Bert to be deployed on an inf2 (with the latest container updates and optimum-neuron updates) example to be provided. A couple of other links that I tried:
The text was updated successfully, but these errors were encountered:
madhurprash
changed the title
Testing Bert on Inferentia2[bug]
Testing Bert on Inferentia2 for text classification - updated example needed [bug]
May 8, 2024
madhurprash
changed the title
Testing Bert on Inferentia2 for text classification - updated example needed [bug]
Testing Bert on Inferentia2 for text classification - Neuron container runtime errors [bug]
May 9, 2024
Warning: Model was compiled with a newer version of torch-neuron than the current runtime (function operator())
Indicates that the neuron-cc version you used to compile the model is different to the version, which is run in the container. We just added a new feature in the latest version that allows you to compile the model on start up to avoid this scenarios.
importsagemakerimportboto3fromsagemaker.huggingfaceimportHuggingFaceModeltry:
role=sagemaker.get_execution_role()
exceptValueError:
iam=boto3.client('iam')
role=iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']
# Hub Model configuration. https://huggingface.co/modelshub= {
'HF_MODEL_ID':'distilbert/distilbert-base-uncased-finetuned-sst-2-english',
'HF_TASK':'text-classification',
'HF_OPTIMUM_BATCH_SIZE': 1, # Batch size used to compile the model'HF_OPTIMUM_SEQUENCE_LENGTH': 512, # Sequence length used to compile the model
}
# create Hugging Face Model Classhuggingface_model=HuggingFaceModel(
transformers_version='4.36.2',
pytorch_version='2.1.2',
py_version='py310',
env=hub,
role=role,
)
# Let SageMaker know that we compile on startuphuggingface_model._is_compiled_model=True# deploy model to SageMaker Inferencepredictor=huggingface_model.deploy(
initial_instance_count=1, # number of instancesinstance_type='ml.inf2.xlarge'# ec2 instance type
)
predictor.predict({
"inputs": "I like you. I love you",
})
Checklist
Concise Description:
I tried deploying Bert on inf2, using the http://763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference-neuronx:1.13.1-transformers4.36.2-neuronx-py310-sdk2.16.1-ubuntu20.04: https://github.com/aws/deep-learning-containers/blob/master/available_images.md#neuron-containers
I tried running the container in this notebook linked here: https://github.com/huggingface/notebooks/blob/main/sagemaker/18_inferentia_inference/sagemaker-notebook.ipynb
This notebook uses inf1, but I had to use inf2, so had to make package updates, etc, and used the following versions in the HuggingFaceModel object:
Error
I get an error as following in the logs:
error 1:
Warning: Model was compiled with a newer version of torch-neuron than the current runtime (function operator())
for this, I tried and running the error on older and newer versions, but still got this.
error 2:
WorkerLifeCycle - RuntimeError: The PyTorch Neuron Runtime could not be initialized. Neuron Driver issues are logged
error 3:
WorkerLifeCycle - 2024-May-08 14:02:23.699765 64:64 ERROR NRT:nrt_allocate_neuron_cores NeuronCore(s) not available - Requested:1 Available:0
i got the error above even through i am setting the os.env variable for nrt_allocate_neuron_cores before importing torch_neuron in the inference.py file.
However, interestingly, I am able to get inference from the model sometimes as follows:
DLC image/dockerfile:
Current behavior:
Throws an error (explained above in detail)
Expected behavior:
Bert to be deployed on an inf2 (with the latest container updates and optimum-neuron updates) example to be provided. A couple of other links that I tried:
https://www.philschmid.de/optimize-deploy-bert-inf2 - this link throws an error while inference (seems like there is an error with the inference.py file)
https://github.com/aws-neuron/aws-neuron-sagemaker-samples/blob/master/inference/inf2-bert-on-sagemaker/inf2_bert_sagemaker.ipynb - this is working code with one error - the container needs to be updated otherwise it throws error at runtime. This is also for a passphrase dataset. Need one similar example for a standard text classification task.
Additional context:
The text was updated successfully, but these errors were encountered: