See aws-toolkit-common/telemetry for full details about defining telemetry metrics.
- You can define new metrics during development by adding items to
telemetry/vscodeTelemetry.json.
- The
generateClients
build task generates new symbols inshared/telemetry/telemetry
, which you can import via:import { telemetry } from '/shared/telemetry/telemetry'
- When your feature is released, the "development" metrics you defined in
vscodeTelemetry.json
should be upstreamed to aws-toolkit-common.
- The
- Metrics are dropped (not posted to the service) if the extension is running in CI or other
automation tasks.
- You can always test telemetry via assertTelemetry(), regardless of the current environment.
- Use
run()
where possible. It automatically sets theresult
andreason
fields. See below for details.run()
gets thereason
value from theError.code
property of any exception that is thrown.- Your code can throw
ToolkitError
with acode
field to communicate errors, validation issues, or cancellation. See below.
- The
reason
andresult
fields are standard metric fields shared by all Toolkits (VSCode, JetBrains, VisualStudio). They should be used instead of special-purpose metrics or fields. result
allows the Toolkits team to monitor all features for potential regressions.reason
gives insight into the cause of aresult=Failed
metric.telemetry.record()
called during atelemetry.foo.run(…)
context will automatically annotate the currentfoo
metric.- For example, the cloudwatch logs feature adds
hasTimeFilter
info its metrics by calling telemetry.record().
- For example, the cloudwatch logs feature adds
User actions or other features may have multiple stages/steps, called a "workflow" or just "flow". A telemetry "trace" captures a flow as tree of "spans".
For example, setupThing()
has multiple steps until it is completed, ending with lastSetupStep()
.
function setupThing() {
setupStep1()
setupStep2()
...
lastSetupStep()
}
If we want to send a metric event, lets call it metric_setupThing
, then the code could look something like this:
function setupThing() {
try {
...
lastSetupStep()
telemetry.metric_setupThing.emit({result: 'Succeeded', ...})
}
catch (e) {
telemetry.metric_setupThing.emit({result: 'Failed', reason: 'Not Really Sure Why' ...})
}
}
Here we emitted a final metric based on the failure or success of the entire execution. Each metric is discrete and immediately gets sent to the telemetry service.
But usually code is not flat and there are many nested calls. If something goes wrong during the execution it would be useful to have more specific information at the area of failure. For that we can use run()
along with telemetry.record()
.
run()
accepts a callback, and when the callback is executed, any uses of telemetry.record()
at any nesting level during execution of that callback, will update the
attributes of the "current metric".
And at the end (that is, when run()
returns) we will emit a single metric with the last updated attributes.
Example
When an exception is thrown from a run()
context, run()
will automatically set
the reason
field based on the Error code
field. You can explicitly set code
when throwing
a ToolkitError
, for example:
throw new ToolkitError('No sso-session name found in ~/.aws/config', { code: 'NoSsoSessionName' })
Note: prefer reason codes with a format similar to existing codes (not sentences). You can find existing codes by searching the codebase:
git grep 'code: '
setupThing()
function setupThing() {
// Start the run() for metric_setupThing
telemetry.metric_setupThing.run(span => {
// Update the metric with initial attributes
span.record({sessionId: '123456'}) // now no matter where the control flow exits after this line in this method, this attribute will always be set
...
setupStep2()
...
if (userInput.CancelSelected) {
// By setting the `cancelled` attribute to true, the `result` attribute will be set to Cancelled
throw new ToolkitError("Thing has been cancelled", { cancelled: true})
}
})
// At this point the final values from the `record()` calls are used to emit a the final metric.
// If no exceptions have been thrown, the `result` attribute is automatically set to Success.
}
function setupStep2() {
try {
// Do work
}
catch (e) {
// Here we can update the metric with more specific information regarding the failure.
// Also notice we are able to use `telemetry.metric_setupThing` versus `span`.
// This is due to `metric_setupThing` being added to the "context" from the above run()
// callback argument. So when we use record() below it will update the same
// thing that span.record() does.
// Keep in mind record() must be run inside the callback argument of run() for
// the attributes of that specific metric to be updated.
telemetry.metric_setupThing.record({
workDone: // ...
})
// If this exception is allowed to propogate to the `run()`, then the `result` will be automatically set to Failed and the `reason` to the `code` set here
throw new ToolkitError(e as Error, { code: "SomethingWentWrongInStep2"})
}
}
Finally, if setupStep2()
was the thing that failed we would see a metric like:
{
"metadata.metricName": "metric_setupThing",
"sessionId": "123456",
"result": "Failed",
"reason": "SomethingWentWrongInStep2",
...
}