This file aims to provide a sort of standardization for our logging. Uniformity allows easier processing after-the-fact (e.g. via LogQL in Grafana), so having formally defined norms can help to reduce the cognitive overhead of writing new code or making changes to existing code.
The following components have been updated to follow this document:
- autoscaler-agent
- autoscale-scheduler (scheduler plugin)
- neonvm-controlller
- neonvm-runner
pod: NamespacedName
— the Pod related to some operation (or long-running handler)virtualmachine: NamespacedName
— the VirtualMachine related to some operation (or long-running handler). Note: Wherever possible,pod
should be set alongside this field, ornull
if the VM's.status.podName
field is not set.node: string
— name of the k8s node that a particular pod or VM is on. Only applicable for some components.error: string
— for error-level and above, the error that occurred.- This is typically handled by
zap.Error(err)
, so no extra naming is necessary.
- This is typically handled by
Zap loggers have names. For example, a zap.Logger
with the name autoscaler-agent.watch
will
output lines like:
{"level":"info","logger":"autoscaler-agent.watch","msg":"hello from the logger!"}
Names are hierarchical — creating a new logger with logger.Named(name)
appends .{name}
to the
logger
field for lines generated by the new logger.
In general, all tight groupings of long-lived goroutines (sometimes just one) are given a leaf
node in the hierarchy. For example: the autoscaler-agent's Runner
s spawn a handful of groroutines
to handle various tasks. The goroutine responsible for updating which scheduler to communciate with
has limited communication and a lifecycle that is notably independent from other parts of the
Runner
. But the part that fetches metrics is (relatively) closely tied to the part that updates
the VM's resources, so they share a logger name (agent.runner.main
).
component.*
— each component (e.g. "autoscaler-agent") has logger names prefixed with the name of the component*.main
— if the bulk of the logic for something is in one straightforward loop (likeautoscaler-agent.runner.main
)*.klog
— for klog output that's been redirected- We may eventually want to have some parsing here so that we can reconstruct key/value pairs, but that's fairly low priority.