Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prompt Caching Preview Not Working With Boto3 #4376

Open
1 task
Armek opened this issue Dec 12, 2024 · 1 comment
Open
1 task

Prompt Caching Preview Not Working With Boto3 #4376

Armek opened this issue Dec 12, 2024 · 1 comment
Assignees
Labels
bedrock-runtime bug This issue is a confirmed bug. p2 This is a standard priority issue response-requested Waiting on additional information or feedback. service-api This issue is caused by the service API, not the SDK implementation.

Comments

@Armek
Copy link

Armek commented Dec 12, 2024

Describe the bug

I'm attempting to run the boto3 example that uses prompt caching in this AWS Blog post: https://aws.amazon.com/blogs/aws/reduce-costs-and-latency-with-amazon-bedrock-intelligent-prompt-routing-and-prompt-caching-preview/

Specifically this block of code:

import json

import boto3

MODEL_ID = "us.anthropic.claude-3-5-sonnet-20241022-v2:0"
AWS_REGION = "us-west-2"

bedrock_runtime = boto3.client(
    "bedrock-runtime",
    region_name=AWS_REGION,
)

DOCS = [
    "bedrock-or-sagemaker.pdf",
    "generative-ai-on-aws-how-to-choose.pdf",
    "machine-learning-on-aws-how-to-choose.pdf",
]

messages = []


def converse(new_message, docs=[], cache=False):

    if len(messages) == 0 or messages[-1]["role"] != "user":
        messages.append({"role": "user", "content": []})

    for doc in docs:
        print(f"Adding document: {doc}")
        name, format = doc.rsplit('.', maxsplit=1)
        with open(doc, "rb") as f:
            bytes = f.read()
        messages[-1]["content"].append({
            "document": {
                "name": name,
                "format": format,
                "source": {"bytes": bytes},
            }
        })

    messages[-1]["content"].append({"text": new_message})

    if cache:
        messages[-1]["content"].append({"cachePoint": {"type": "default"}})

    response = bedrock_runtime.converse(
        modelId=MODEL_ID,
        messages=messages,
    )

    output_message = response["output"]["message"]
    response_text = output_message["content"][0]["text"]

    print("Response text:")
    print(response_text)

    print("Usage:")
    print(json.dumps(response["usage"], indent=2))

    messages.append(output_message)


converse("Compare AWS Trainium and AWS Inferentia in 20 words or less.", docs=DOCS, cache=True)
converse("Compare Amazon Textract and Amazon Transcribe in 20 words or less.")
converse("Compare Amazon Q Business and Amazon Q Developer in 20 words or less.")

My organization has the prompt caching preview enabled and we are running the latest version of boto3 (1.35.79 at time of this post). I get the following exception when running the above example:

botocore.exceptions.ParamValidationError: Parameter validation failed:
Unknown parameter in messages[0].content[1]: "cachePoint", must be one of: text, image, document, video, toolUse, toolResult, guardContent

It appears that boto3 doesn't support this parameter yet, but the AWS blog shows it being used above. Is the version of boto3 that supports this not yet published or am I potentially doing something wrong?

Regression Issue

  • Select this option if this issue appears to be a regression.

Expected Behavior

I expect to be able to call the Converse API with prompt caching working.

Current Behavior

I get the below error

botocore.exceptions.ParamValidationError: Parameter validation failed:
Unknown parameter in messages[0].content[1]: "cachePoint", must be one of: text, image, document, video, toolUse, toolResult, guardContent

Reproduction Steps

Run the code as documented in the AWS Blog: https://aws.amazon.com/blogs/aws/reduce-costs-and-latency-with-amazon-bedrock-intelligent-prompt-routing-and-prompt-caching-preview/

This block specifically:

import json

import boto3

MODEL_ID = "us.anthropic.claude-3-5-sonnet-20241022-v2:0"
AWS_REGION = "us-west-2"

bedrock_runtime = boto3.client(
    "bedrock-runtime",
    region_name=AWS_REGION,
)

DOCS = [
    "bedrock-or-sagemaker.pdf",
    "generative-ai-on-aws-how-to-choose.pdf",
    "machine-learning-on-aws-how-to-choose.pdf",
]

messages = []


def converse(new_message, docs=[], cache=False):

    if len(messages) == 0 or messages[-1]["role"] != "user":
        messages.append({"role": "user", "content": []})

    for doc in docs:
        print(f"Adding document: {doc}")
        name, format = doc.rsplit('.', maxsplit=1)
        with open(doc, "rb") as f:
            bytes = f.read()
        messages[-1]["content"].append({
            "document": {
                "name": name,
                "format": format,
                "source": {"bytes": bytes},
            }
        })

    messages[-1]["content"].append({"text": new_message})

    if cache:
        messages[-1]["content"].append({"cachePoint": {"type": "default"}})

    response = bedrock_runtime.converse(
        modelId=MODEL_ID,
        messages=messages,
    )

    output_message = response["output"]["message"]
    response_text = output_message["content"][0]["text"]

    print("Response text:")
    print(response_text)

    print("Usage:")
    print(json.dumps(response["usage"], indent=2))

    messages.append(output_message)


converse("Compare AWS Trainium and AWS Inferentia in 20 words or less.", docs=DOCS, cache=True)
converse("Compare Amazon Textract and Amazon Transcribe in 20 words or less.")
converse("Compare Amazon Q Business and Amazon Q Developer in 20 words or less.")

PDF's needed are linked to in the Blog post

Possible Solution

No response

Additional Information/Context

Our organization that I'm authenticating with has been enabled for the prompt caching preview.

SDK version used

1.35.79

Environment details (OS name and version, etc.)

Windows 11 and Linux

@Armek Armek added bug This issue is a confirmed bug. needs-triage This issue or PR still needs to be triaged. labels Dec 12, 2024
@tim-finnigan tim-finnigan self-assigned this Dec 12, 2024
@tim-finnigan tim-finnigan added investigating This issue is being investigated and/or work is in progress to resolve the issue. service-api This issue is caused by the service API, not the SDK implementation. p2 This is a standard priority issue bedrock-runtime labels Dec 12, 2024
@tim-finnigan
Copy link
Contributor

Thanks for reaching out — this issue is with the Bedrock Runtime API rather than Boto3 directly. Per the blog post that you referenced:

Amazon Bedrock support for prompt caching is available in preview in US West (Oregon) for Anthropic’s Claude 3.5 Sonnet V2 and Claude 3.5 Haiku. Prompt caching is also available in US East (N. Virginia) for Amazon Nova Micro, Amazon Nova Lite, and Amazon Nova Pro. You can request access to the Amazon Bedrock prompt caching preview her

You said that you already have prompt caching enabled. Can you confirm that it is enabled for us-west-2 and the account you are using?

In the meantime I'll also try to get more clarification from the Bedrock team regarding the expected behavior. I saw cachePoint was also referenced here in the Amazon Bedrock User Guide: https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.html. However it is not mentioned in the Converse API documentation.

@tim-finnigan tim-finnigan added response-requested Waiting on additional information or feedback. and removed investigating This issue is being investigated and/or work is in progress to resolve the issue. needs-triage This issue or PR still needs to be triaged. labels Dec 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bedrock-runtime bug This issue is a confirmed bug. p2 This is a standard priority issue response-requested Waiting on additional information or feedback. service-api This issue is caused by the service API, not the SDK implementation.
Projects
None yet
Development

No branches or pull requests

2 participants