[Bug Report] You are forcing Jumpstart to use ml.p4d.24xlarge even when instance_type is specified #4666

math-sasso · 2024-05-31T12:26:16Z

Link to the notebook
In the code below I am clearly passing a different instance type where I want to deploy my trained moedl

finetuned_predictor = estimator.deploy(
    instance_type='ml.g5.48xlarge', #NOTE: It is ingorring the instance I pass here and always deploying with ml.p4d.24xlarge which by the way is the most expensive possible
    tags=deployment_endpoint_tags,
    endpoint_name=desired_endpoint_name
)

Describe the bug

You are forcing Jumpstart to use ml.p4d.24xlarge even when instance_type is specified. It is happening under JumpstartEstimator class in the deploy method when ``get_deploy_kwargs` is passed. I needed to do this to fix.

estimator_deploy_kwargs = get_deploy_kwargs(
            model_id=self.model_id,
            model_version=self.model_version,
            region=self.region,
            tolerate_vulnerable_model=self.tolerate_vulnerable_model,
            tolerate_deprecated_model=self.tolerate_deprecated_model,
            initial_instance_count=initial_instance_count,
            instance_type=instance_type,
            serializer=serializer,
            deserializer=deserializer,
            accelerator_type=accelerator_type,
            endpoint_name=endpoint_name,
            tags=format_tags(tags),
            kms_key=kms_key,
            wait=wait,
            data_capture_config=data_capture_config,
            async_inference_config=async_inference_config,
            serverless_inference_config=serverless_inference_config,
            volume_size=volume_size,
            model_data_download_timeout=model_data_download_timeout,
            container_startup_health_check_timeout=container_startup_health_check_timeout,
            inference_recommendation_id=inference_recommendation_id,
            explainer_config=explainer_config,
            image_uri=image_uri,
            role=role,
            predictor_cls=predictor_cls,
            env=env,
            model_name=model_name,
            vpc_config=vpc_config,
            sagemaker_session=sagemaker_session,
            enable_network_isolation=enable_network_isolation,
            model_kms_key=model_kms_key,
            image_config=image_config,
            source_dir=source_dir,
            code_location=code_location,
            entry_point=entry_point,
            container_log_level=container_log_level,
            dependencies=dependencies,
            git_config=git_config,
            use_compiled_model=use_compiled_model,
            training_instance_type=self.instance_type,
        )

        # NOTE: Done by Matheus.
        estimator_deploy_kwargs.instance_type = instance_type

To reproduce
Just train or load a model that is in jumpstart (in my case it was a llama3 70B one) and try to deploy it.
You probably have this check because it is a big mode, but other machines are also able to deploy this model. But if the user is passing a different instance_type you should at least let them try and if a future error happens it is because the machine did not supported that model.

Logs
If applicable, add logs to help explain your problem.

ResourceLimitExceeded: An error occurred (ResourceLimitExceeded) when calling the CreateEndpoint operation: The account-level service limit 'ml.p4d.24xlarge for endpoint usage' is 2 Instances, with current utilization of 2 Instances and a request delta of 1 Instances. Please use AWS Service Quotas to request an increase for this quota. If AWS Service Quotas is not available, contact AWS support to request an increase for this quota.

But as showed in my code, I wanna deploy ml.g5.48xlarge

I am not authtorized to share the notebook here publicly, please reach out for that.

The text was updated successfully, but these errors were encountered:

patrickmcarlos · 2024-06-10T22:44:41Z

Hi @math-sasso, thanks for the report! A PR has been raised and merged that fixes this issue ✅

aws/sagemaker-python-sdk#4724

math-sasso · 2024-06-10T22:51:06Z

@patrickmcarlos Amazing! Always happy to help

benfriebe mentioned this issue Jun 6, 2024

fix: fix instance_type assignment logic aws/sagemaker-python-sdk#4719

Closed

9 tasks

evakravi mentioned this issue Jun 10, 2024

fix: estimator.deploy not respecting instance type aws/sagemaker-python-sdk#4724

Merged

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug Report] You are forcing Jumpstart to use ml.p4d.24xlarge even when instance_type is specified #4666

[Bug Report] You are forcing Jumpstart to use ml.p4d.24xlarge even when instance_type is specified #4666

math-sasso commented May 31, 2024

patrickmcarlos commented Jun 10, 2024 •

edited

Loading

math-sasso commented Jun 10, 2024 •

edited

Loading

[Bug Report] You are forcing Jumpstart to use ml.p4d.24xlarge even when instance_type is specified #4666

[Bug Report] You are forcing Jumpstart to use ml.p4d.24xlarge even when instance_type is specified #4666

Comments

math-sasso commented May 31, 2024

patrickmcarlos commented Jun 10, 2024 • edited Loading

math-sasso commented Jun 10, 2024 • edited Loading

patrickmcarlos commented Jun 10, 2024 •

edited

Loading

math-sasso commented Jun 10, 2024 •

edited

Loading