Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SAM] Hangs for the entire lambda timeout, then responds #176

Closed
distinctdan opened this issue Aug 23, 2024 · 8 comments
Closed

[SAM] Hangs for the entire lambda timeout, then responds #176

distinctdan opened this issue Aug 23, 2024 · 8 comments

Comments

@distinctdan
Copy link

Problem

Hi, I'm using AWS SAM and I'm running locally with sam local start-api. However, when I add the dd lambda extension layer, it causes my handler to run for the full timeout before responding normally. From my local logs, I can see that my handler is returning a value almost immediately, so the delay is from the dd lambda wrapper. This makes it very difficult to work locally, which is important to me. My guess is maybe it's failing to authenticate with datadog and timing out? But, it's not emitting any auth errors. Is there any way to get more information about why it's hanging?

Sequence of events

  1. I start my local API using sam build, then sam local start-api.
  2. Send request using postman
  3. My lambda runs and returns a response almost immediately. There are a few logs from datadog:
Lambda function 'MyFunction' is already running                                                                                    
START RequestId: 21d0304a-779d-4b7a-b895-5b1142303bff Version: $LATEST
2024/08/23 16:27:05 {"status":"error","message":"datadog: Couldn't convert X-Ray trace context: Couldn't read trace id from X-Ray: invalid x-ray trace id; expected 3 components in id"}
2024/08/23 16:27:05 Datadog Tracer v1.65.1 INFO: DATADOG TRACER CONFIGURATION { <long configuration json> }
2024/08/23 16:27:05 MyFunctionHandler: returning response
  1. The lambda runs for the full 30 second timeout, then it logs the following and sends the response back to postman.
END RequestId: 7bb76fa1-2e9e-4ba5-9486-d8cbeac70bcc
REPORT RequestId: 7bb76fa1-2e9e-4ba5-9486-d8cbeac70bcc  Init Duration: 0.15 ms  Duration: 29984.04 ms   Billed Duration: 29985 ms       Memory Size: 128 MB       Max Memory Used: 128 MB

Relevant code

bootstrap.go:

func main() {
	lambda.Start(ddlambda.WrapFunction(handler, nil))
}

func handler(ctx context.Context, req events.APIGatewayProxyRequest) (events.APIGatewayProxyResponse, error) {
        // Do handler logic

        log.Printf("MyFunctionHandler: returning response")
        return events.APIGatewayProxyResponse{
                StatusCode: 200,
                Body:       "Success",
        }, nil
}

template.yaml:

Resources:
  MyLambdaFunction:
    Type: AWS::Serverless::Function
    Metadata:
      BuildMethod: go1.x
    Properties:
      CodeUri: myproject/
      Handler: bootstrap
      Runtime: provided.al2023
      Architectures:
        - x86_64
      Layers:
        - arn:aws:lambda:us-east-1:464622532012:layer:Datadog-Extension:64
      Policies:
        - AWSSecretsManagerGetSecretValuePolicy:
            SecretArn: "arn:aws:secretsmanager:us-east-1:<my account>:secret:DD_API_KEY_MY_PROJECT"
      Environment:
        Variables:
          DD_SITE: 'datadoghq.com'
          DD_API_KEY_SECRET_ARN: 'arn:aws:secretsmanager:us-east-1:<my account>:secret:DD_API_KEY_MY_PROJECT'
  • Datadog Go Lambda package version: github.com/DataDog/datadog-lambda-go v1.19.0
  • Go version: go1.22.6 darwin/amd64
@astuyve
Copy link

astuyve commented Aug 23, 2024

Hey @distinctdan - thanks for reaching out!

I suspect that the reason is because the AWS Lambda Runtime Interface Emulator is being used by SAM.
Unfortunately AWS does not emulate the telemetry API in the runtime interface emulator.

The telemetry API is how Lambda Extensions communicate with the AWS Lambda Runtime, and we require it to tell us important things like:

  • logs from your function
  • runtime metrics
    and most crucially
  • runtime events, like when your Lambda function is finished processing.

Today we can't support the emulator because of this limitation. Instead, we recommend using a conditional variable to skip the Lambda Extension and Lambda Instrumentation when emulating locally.

This has been an issue for a long time, so we'd appreciate if you leave a comment asking for this feature from AWS and mention it to your TAM.

Thanks!
AJ

@distinctdan
Copy link
Author

Thanks for the info! I've commented on their issue.

Since this is a known issue where we know DataDog isn't compatible with SAM local, I would request a documentation update on the DD side for the following:

  • Warning message that this doesn't work.
  • Workaround code to conditionally turn on DD only when not running locally.

@astuyve
Copy link

astuyve commented Aug 23, 2024

Thanks @distinctdan - I've learned that we document the DD_LOCAL_TESTING env var here, but I'm not sure exactly the full breadth of support and I don't expect Datadog's instrumentation to work locally due to the disparities between Lambda and other local testing options.

It may prevent the extension from hanging in your case though. Either way, our recommendation is to disable Datadog in a local emulation environment.

Thanks!

@astuyve astuyve closed this as completed Aug 23, 2024
@distinctdan
Copy link
Author

Thanks for the info, but unfortunately it looks like that flag doesn't change the behavior for me. At the very least, I would expect the DD lambda to emit some kind of log saying it can't connect instead of silently failing. I know it's not y'all's fault that this doesn't work, but I've spent several hours tracking this down, and some documentation or error logs on the datadog side would have made this a lot easier.

@astuyve
Copy link

astuyve commented Aug 23, 2024

Hi @distinctdan, thanks for the note.

The connection from the extension to the telemetry API can be established, the issue is that no runtime events are passed (only logs). So for the extension, there's no way to know that we'll never receive those events.

I will ask our documentation team to more clearly describe our recommendation to not use Datadog's instrumentation locally at this time.

Thanks!

@distinctdan
Copy link
Author

I see, yeah that's pretty unfortunate. Ok, thanks for the help!

@astuyve
Copy link

astuyve commented Aug 23, 2024

Anytime!

@astuyve
Copy link

astuyve commented Sep 11, 2024

Hi @distinctdan - if you try out our new next-generation Lambda extension locally, you can specify a short flush interval via DD_SERVERLESS_FLUSH_STRATEGY=periodically,1 which should no longer wait for any events. I don't expect the data to be correct or coherent in Datadog, but it should at least allow the function to run: DataDog/datadog-lambda-extension#377

I hope this helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants