-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Serverless Distributed Tracing: Trace Extractor/Propagation for batched events #317
Comments
Hi @lucashfreitas - thank you for your detailed note. For EventBridge, as of today we only support Lambda as a direct target to automatically decode and pass trace context. I think we could explore expanding that to SQS/SNS as well as other services we traditionally support, so please feel free to reach out to your account manager to open a feature request. What I would suggest is doing what you've already done - which brings us to your second point. Today, Datadog APM doesn't support merging multiple upstream trace contexts. So you'll need to pick one of the SQS messages and use its context as your upstream trace context for the rest of your function execution. Please feel free to reach out with any additional questions. Thank you! |
Hey @astuyve, thanks for answering that quickly.
We are trying to do that but somethings are still not clear, e.g how to define a custom extractor for multiple events inside a lambda. Currently, the extractor function has a 1-to-1 relationship with the lambda handler function as per the example, but how we would extract traces inside a for loop? e.g a lambda handler has 10 events as the payload, so we would need to get the trace context 10 times and send 10 additional traces of the lambda function execution. We are opening a ticket with datadog to track this. Thank you |
@astuyve does datadog-lambda-js have any updates for propagating traces using batched events? |
Span links are the tracing primitive for handling workflows like this. A given span can only have one parent trace context, composed of a trace and span ID. https://docs.datadoghq.com/tracing/trace_collection/span_links/ If you invoke a Lambda, the natural parent is the Lambda invocation span. The span for processing each message can't both be a child of the causing span ( Taking EventBridge out of the equation for simplicity, without batches, you can draw a hierarchy from But if you receive batches of n>1, this no longer works. You could create a new span for each message and associate it to the trace context in the message, but then it's no longer associated with the Lambda trace. My recommendation (as another user) would be:
That way you get your linear execution timeline for the batch job (1) but can still have bidirectional references to and from the causing event that enqueued the message (2). https://datadoghq.dev/dd-trace-js/interfaces/export_.Span.html#addLink.addLink-1 |
Hi folks! Thanks @bendrucker for the additional details! Span Links work great when we can propagate the Trace and Span IDs. Good new is that we are also working on automatic span linking for situations where the context cannot be propagated. Our plan is to provide the UI and tracer API enhancements soon and work to enable it for various use cases, starting with S3 objects, but SQS and Dynamo are definitely on the radar. |
I am working with a serverless event drive architecture that uses
Event Bridge
,SQS
, andLambda
:datadog-cdk
construct) pushes message to Event Bridge.datadog-cdk
construct) consumes the message from the SQS queue and sends them into Event Bus again and we move back to step 1.Our goal is to enable end-to-end traces for this architecture.
1. We wrapped all lambdas (publishers and consumers) with
datadog-cdk
construct but this produced multiple disconnected traces:Following this documentation https://docs.datadoghq.com/serverless/distributed_tracing/serverless_trace_propagation/?tab=nodejs, I would expect that the trace propagation happens automatically as mentioned here:
But the traces are not being associated/propagated and I am seeing multiple disconnected traces - not sure if this happens because event bridge invokes the lambda asynchronously, so maybe we really need to "manually" extract the
traceContext
and pass it through the_datadog
field in the event bus.2. We have implemented a manual trace extractor propagation following datadog documentation:
We have implemented a manual trace propagation following the docs/tutorial https://docs.datadoghq.com/serverless/distributed_tracing/serverless_trace_propagation/?tab=nodejs here and we managed to connect the tracing, but we are now facing another issue to handle/propagate trace for batched events on Lambda functions.
All the examples/docs for trace extraction, even the handler
wrapper
provided by this library expect to return a single trace per lambda function.If we decide to export a file on the function and set the
DD_TRACE_EXTRACTOR
we also return a single object.The issue is that our lambda function actually handles a batch of
events
coming from an SQS queue (10+) and each of those events might have a different trace context but we are not sure how to handle this using this library or perhaps we should manually usedd-trace
library to automatically create the trace and send it to datalog for each event in the batch.Can someone help or provide if that's not possible to achieve using this library and we really need to use
dd-trace
to manually create and send the trace to datadog?Thanks
The text was updated successfully, but these errors were encountered: