Skip to content

Latest commit

 

History

History
141 lines (111 loc) · 7.13 KB

telemetry.md

File metadata and controls

141 lines (111 loc) · 7.13 KB

Telemetry

Development

See aws-toolkit-common/telemetry for full details about defining telemetry metrics.

  • You can define new metrics during development by adding items to telemetry/vscodeTelemetry.json.
    • The generateClients build task generates new symbols in shared/telemetry/telemetry, which you can import via:
      import { telemetry } from '/shared/telemetry/telemetry'
      
    • When your feature is released, the "development" metrics you defined in vscodeTelemetry.json should be upstreamed to aws-toolkit-common.
  • Metrics are dropped (not posted to the service) if the extension is running in CI or other automation tasks.
    • You can always test telemetry via assertTelemetry(), regardless of the current environment.

Guidelines

  • Use run() where possible. It automatically sets the result and reason fields. See below for details.
    • run() gets the reason value from the Error.code property of any exception that is thrown.
    • Your code can throw ToolkitError with a code field to communicate errors, validation issues, or cancellation. See below.
  • The reason and result fields are standard metric fields shared by all Toolkits (VSCode, JetBrains, VisualStudio). They should be used instead of special-purpose metrics or fields.
  • result allows the Toolkits team to monitor all features for potential regressions.
  • reason gives insight into the cause of a result=Failed metric.
  • telemetry.record() called during a telemetry.foo.run(…) context will automatically annotate the current foo metric.

Incrementally Building a Metric

User actions or other features may have multiple stages/steps, called a "workflow" or just "flow". A telemetry "trace" captures a flow as tree of "spans".

For example, setupThing() has multiple steps until it is completed, ending with lastSetupStep().

function setupThing() {
    setupStep1()
    setupStep2()
    ...
    lastSetupStep()
}

If we want to send a metric event, lets call it metric_setupThing, then the code could look something like this:

function setupThing() {
    try {
        ...
        lastSetupStep()
        telemetry.metric_setupThing.emit({result: 'Succeeded', ...})
    }
    catch (e) {
        telemetry.metric_setupThing.emit({result: 'Failed', reason: 'Not Really Sure Why' ...})
    }
}

Here we emitted a final metric based on the failure or success of the entire execution. Each metric is discrete and immediately gets sent to the telemetry service.


But usually code is not flat and there are many nested calls. If something goes wrong during the execution it would be useful to have more specific information at the area of failure. For that we can use run() along with telemetry.record().

run() accepts a callback, and when the callback is executed, any uses of telemetry.record() at any nesting level during execution of that callback, will update the attributes of the "current metric". And at the end (that is, when run() returns) we will emit a single metric with the last updated attributes. Example

When an exception is thrown from a run() context, run() will automatically set the reason field based on the Error code field. You can explicitly set code when throwing a ToolkitError, for example:

throw new ToolkitError('No sso-session name found in ~/.aws/config', { code: 'NoSsoSessionName' })

Note: prefer reason codes with a format similar to existing codes (not sentences). You can find existing codes by searching the codebase:

git grep 'code: '

Example

setupThing()

function setupThing() {
    // Start the run() for metric_setupThing
    telemetry.metric_setupThing.run(span => {
        // Update the metric with initial attributes
        span.record({sessionId: '123456'}) // now no matter where the control flow exits after this line in this method, this attribute will always be set
        ...
        setupStep2()
        ...

        if (userInput.CancelSelected) {
            // By setting the `cancelled` attribute to true, the `result` attribute will be set to Cancelled
            throw new ToolkitError("Thing has been cancelled", { cancelled: true})
        }
    })
    // At this point the final values from the `record()` calls are used to emit a the final metric.
    // If no exceptions have been thrown, the `result` attribute is automatically set to Success.
}

function setupStep2() {
    try {
        // Do work
    }
    catch (e) {
        // Here we can update the metric with more specific information regarding the failure.

        // Also notice we are able to use `telemetry.metric_setupThing` versus `span`.
        // This is due to `metric_setupThing` being added to the "context" from the above run()
        // callback argument. So when we use record() below it will update the same
        // thing that span.record() does.

        // Keep in mind record() must be run inside the callback argument of run() for
        // the attributes of that specific metric to be updated.
        telemetry.metric_setupThing.record({
            workDone: // ...
        })
        // If this exception is allowed to propogate to the `run()`, then the `result` will be automatically set to Failed and the `reason` to the `code` set here
        throw new ToolkitError(e as Error, { code: "SomethingWentWrongInStep2"})
    }
}

Finally, if setupStep2() was the thing that failed we would see a metric like:

{
    "metadata.metricName": "metric_setupThing",
    "sessionId": "123456",
    "result": "Failed",
    "reason": "SomethingWentWrongInStep2",
    ...
}