Skip to content

Latest commit

 

History

History
56 lines (42 loc) · 2.56 KB

LOGGING.md

File metadata and controls

56 lines (42 loc) · 2.56 KB

Logging

This file aims to provide a sort of standardization for our logging. Uniformity allows easier processing after-the-fact (e.g. via LogQL in Grafana), so having formally defined norms can help to reduce the cognitive overhead of writing new code or making changes to existing code.

Components checklist

The following components have been updated to follow this document:

  • autoscaler-agent
  • autoscale-scheduler (scheduler plugin)
  • neonvm-controlller
  • neonvm-runner

Common keys

  • pod: NamespacedName — the Pod related to some operation (or long-running handler)
  • virtualmachine: NamespacedName — the VirtualMachine related to some operation (or long-running handler). Note: Wherever possible, pod should be set alongside this field, or null if the VM's .status.podName field is not set.
  • node: string — name of the k8s node that a particular pod or VM is on. Only applicable for some components.
  • error: string — for error-level and above, the error that occurred.
    • This is typically handled by zap.Error(err), so no extra naming is necessary.

Logger name structuring

Zap loggers have names. For example, a zap.Logger with the name autoscaler-agent.watch will output lines like:

{"level":"info","logger":"autoscaler-agent.watch","msg":"hello from the logger!"}

Names are hierarchical — creating a new logger with logger.Named(name) appends .{name} to the logger field for lines generated by the new logger.

In general, all tight groupings of long-lived goroutines (sometimes just one) are given a leaf node in the hierarchy. For example: the autoscaler-agent's Runners spawn a handful of groroutines to handle various tasks. The goroutine responsible for updating which scheduler to communciate with has limited communication and a lifecycle that is notably independent from other parts of the Runner. But the part that fetches metrics is (relatively) closely tied to the part that updates the VM's resources, so they share a logger name (agent.runner.main).

Logger naming conventions

  • component.* — each component (e.g. "autoscaler-agent") has logger names prefixed with the name of the component
  • *.main — if the bulk of the logic for something is in one straightforward loop (like autoscaler-agent.runner.main)
  • *.klog — for klog output that's been redirected
    • We may eventually want to have some parsing here so that we can reconstruct key/value pairs, but that's fairly low priority.