Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System Instrumentation: How to use properly #599

Open
satwikkansal opened this issue Nov 14, 2024 · 7 comments
Open

System Instrumentation: How to use properly #599

satwikkansal opened this issue Nov 14, 2024 · 7 comments
Labels
documentation Improvements or additions to documentation OTel Issue

Comments

@satwikkansal
Copy link

Question

I want to use logfire to push some system as well as process metrics, however it feels like the documentation could be more complete.

Looking at the documentation, I added up this code

import logfire
from dotenv import load_dotenv

import time


load_dotenv()

# System-wide metrics (monitors entire system)
system_metrics = {
    # CPU metrics for whole system
    'system.cpu.simple_utilization': None,
    # System memory usage
    'system.memory.utilization': ['available', 'used'],
    # Disk I/O for all processes
    'system.disk.io': ['read', 'write'],
    # Network I/O for all processes
    'system.network.io': ['transmit', 'receive'],
    # System swap usage
    'system.swap.utilization': ['used']
}

# Process-specific metrics (only for your Python application)
process_metrics = {
    # CPU usage of this Python process
    'process.runtime.cpu.utilization': None,
    # Memory usage of this Python process
    'process.runtime.memory': ['rss', 'vms'],
    # Thread count of this Python process
    'process.runtime.thread_count': None,
    # File descriptors opened by this process
    'process.open_file_descriptor.count': None
}

logfire.configure()

while True:
    logfire.instrument_system_metrics(system_metrics, base=None)
    logfire.instrument_system_metrics(process_metrics, base=None)
    time.sleep(60)

My goal was

  • Have separate invocations for system metrics and process metrics, because I want to have a separate process altogether to monitor system-wide metrics, and then I do want to monitor usage of my python applications running.
  • Have complete control over what is pushed, hence base=None argument.

Is the above way the right way to do so?

While running this code, I get couple of issues

  1. First of all, a warning of sorts
Attempting to instrument while already instrumented
An instrument with name process.runtime.cpython.cpu.utilization, type ObservableGauge, unit 1 and description Runtime CPU utilization has been created already.

I believe this is occurring because of the while loop, but then again if I don't have the while loop the script just starts and shuts and all I see on my dashboard is single data point.

  1. And a bunch of errors
Callback failed for instrument system.swap.utilization.
Traceback (most recent call last):
  File "/Users/satwik/code/ongoing/cq/venv/lib/python3.11/site-packages/opentelemetry/sdk/metrics/_internal/instrument.py", line 136, in callback
    for api_measurement in callback(callback_options):
  File "/Users/satwik/code/ongoing/cq/venv/lib/python3.11/site-packages/opentelemetry/instrumentation/system_metrics/__init__.py", line 500, in _get_system_swap_utilization
    for metric in self._config["system.swap.utilization"]:
                  ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'system.swap.utilization'
Callback failed for instrument system.disk.io.
Traceback (most recent call last):
  File "/Users/satwik/code/ongoing/cq/venv/lib/python3.11/site-packages/opentelemetry/sdk/metrics/_internal/instrument.py", line 136, in callback
    for api_measurement in callback(callback_options):
  File "/Users/satwik/code/ongoing/cq/venv/lib/python3.11/site-packages/opentelemetry/instrumentation/system_metrics/__init__.py", line 517, in _get_system_disk_io
    for metric in self._config["system.disk.io"]:
                  ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
KeyError: 'system.disk.io'
Callback failed for instrument system.network.io.
Traceback (most recent call last):
  File "/Users/satwik/code/freelance/cq/venv/lib/python3.11/site-packages/opentelemetry/sdk/metrics/_internal/instrument.py", line 136, in callback
    for api_measurement in callback(callback_options):
  File "/Users/satwik/code/ongoing/cq/venv/lib/python3.11/site-packages/opentelemetry/instrumentation/system_metrics/__init__.py", line 629, in _get_system_network_io
    for metric in self._config["system.network.dropped.packets"]:
                  ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'system.network.dropped.packets'
^CTraceback (most recent call last):
  File "/Users/satwik/code/ongoing/cq/system_metrics.py", line 40, in <module>
    time.sleep(60)
KeyboardInterrupt
Callback failed for instrument system.memory.utilization.
Traceback (most recent call last):
  File "/Users/satwik/code/ongoing/cq/venv/lib/python3.11/site-packages/opentelemetry/sdk/metrics/_internal/instrument.py", line 136, in callback
    for api_measurement in callback(callback_options):
  File "/Users/satwik/code/ongoing/cq/venv/lib/python3.11/site-packages/opentelemetry/instrumentation/system_metrics/__init__.py", line 472, in _get_system_memory_utilization
    for metric in self._config["system.memory.utilization"]:
                  ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'system.memory.utilization'

Am I missing any step or doing something incorrectly?

@satwikkansal satwikkansal added the Question Further information is requested label Nov 14, 2024
@alexmojaki
Copy link
Contributor

logfire.instrument_system_metrics must only be called once. It sets up a loop in a background thread which exports metrics every 60 seconds, and once at the end of the process. The only reason to use a loop is to keep the process alive if it's doing nothing else, e.g.:

logfire.instrument_system_metrics()

while True:
    time.sleep(60)

I want to have a separate process altogether to monitor system-wide metrics

I don't know if you really need this as opposed to just also exporting system-wide metrics from your main application processes. But if you do, then the two calls to logfire.instrument_system_metrics will be in separate processes so there won't be a problem. If you have a process whose only job is to report system-wide metrics then it's not really useful to measure its own process metrics.

If you want to instrument both process and system metrics within a single process, then call instrument_system_metrics once with a single dict combining both.

@satwikkansal
Copy link
Author

satwikkansal commented Nov 14, 2024

Thanks!

Any ideas about the errors below

Traceback (most recent call last):
  File "/Users/satwik/code/freelance/ongoing/cq/loee/venv/lib/python3.11/site-packages/opentelemetry/sdk/metrics/_internal/instrument.py", line 136, in callback
    for api_measurement in callback(callback_options):
  File "/Users/satwik/code/freelance/ongoing/cq/loee/venv/lib/python3.11/site-packages/opentelemetry/instrumentation/system_metrics/__init__.py", line 629, in _get_system_network_io
    for metric in self._config["system.network.dropped.packets"]:
                  ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'system.network.dropped.packets'
Callback failed for instrument system.swap.utilization.
Traceback (most recent call last):
  File "/Users/satwik/code/freelance/ongoing/cq/loee/venv/lib/python3.11/site-packages/opentelemetry/sdk/metrics/_internal/instrument.py", line 136, in callback
    for api_measurement in callback(callback_options):
  File "/Users/satwik/code/freelance/ongoing/cq/loee/venv/lib/python3.11/site-packages/opentelemetry/instrumentation/system_metrics/__init__.py", line 500, in _get_system_swap_utilization
    for metric in self._config["system.swap.utilization"]:
                  ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'system.swap.utilization'
Callback failed for instrument system.disk.io.
Traceback (most recent call last):
  File "/Users/satwik/code/freelance/ongoing/cq/loee/venv/lib/python3.11/site-packages/opentelemetry/sdk/metrics/_internal/instrument.py", line 136, in callback
    for api_measurement in callback(callback_options):
  File "/Users/satwik/code/freelance/ongoing/cq/loee/venv/lib/python3.11/site-packages/opentelemetry/instrumentation/system_metrics/__init__.py", line 517, in _get_system_disk_io
    for metric in self._config["system.disk.io"]:
                  ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
KeyError: 'system.disk.io'
Callback failed for instrument system.network.io.

I still get them

@alexmojaki
Copy link
Contributor

Thanks!

Any ideas about the errors below

Traceback (most recent call last):
  File "/Users/satwik/code/freelance/ongoing/cq/loee/venv/lib/python3.11/site-packages/opentelemetry/sdk/metrics/_internal/instrument.py", line 136, in callback
    for api_measurement in callback(callback_options):
  File "/Users/satwik/code/freelance/ongoing/cq/loee/venv/lib/python3.11/site-packages/opentelemetry/instrumentation/system_metrics/__init__.py", line 629, in _get_system_network_io
    for metric in self._config["system.network.dropped.packets"]:
                  ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'system.network.dropped.packets'

Reported open-telemetry/opentelemetry-python-contrib#3005

Callback failed for instrument system.swap.utilization.
Traceback (most recent call last):
  File "/Users/satwik/code/freelance/ongoing/cq/loee/venv/lib/python3.11/site-packages/opentelemetry/sdk/metrics/_internal/instrument.py", line 136, in callback
    for api_measurement in callback(callback_options):
  File "/Users/satwik/code/freelance/ongoing/cq/loee/venv/lib/python3.11/site-packages/opentelemetry/instrumentation/system_metrics/__init__.py", line 500, in _get_system_swap_utilization
    for metric in self._config["system.swap.utilization"]:
                  ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'system.swap.utilization'
Callback failed for instrument system.disk.io.
Traceback (most recent call last):
  File "/Users/satwik/code/freelance/ongoing/cq/loee/venv/lib/python3.11/site-packages/opentelemetry/sdk/metrics/_internal/instrument.py", line 136, in callback
    for api_measurement in callback(callback_options):
  File "/Users/satwik/code/freelance/ongoing/cq/loee/venv/lib/python3.11/site-packages/opentelemetry/instrumentation/system_metrics/__init__.py", line 517, in _get_system_disk_io
    for metric in self._config["system.disk.io"]:
                  ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
KeyError: 'system.disk.io'
Callback failed for instrument system.network.io.

This is not the same kind of mismatch, I can't reproduce these errors if I only call instrument_system_metrics once. What code did you run?

@alexmojaki alexmojaki added documentation Improvements or additions to documentation OTel Issue labels Nov 14, 2024
@alexmojaki
Copy link
Contributor

Added a docs label for us to make it clearer that instrument_system_metrics should only be called once.

@alexmojaki alexmojaki removed the Question Further information is requested label Nov 14, 2024
@satwikkansal
Copy link
Author

satwikkansal commented Nov 17, 2024

import logfire
from dotenv import load_dotenv

import time


load_dotenv()

# System-wide metrics (monitors entire system)
system_metrics = {
    # CPU metrics for whole system
    'system.cpu.simple_utilization': None,
    # System memory usage
    'system.memory.utilization': ['available', 'used'],
    # Disk I/O for all processes
    'system.disk.io': ['read', 'write'],
    # Network I/O for all processes
    'system.network.io': ['transmit', 'receive'],
    # System swap usage
    'system.swap.utilization': ['used']
}

# Process-specific metrics (only for your Python application)
process_metrics = {
    # CPU usage of this Python process
    'process.runtime.cpu.utilization': None,
    # Memory usage of this Python process
    'process.runtime.memory': ['rss', 'vms'],
    # Thread count of this Python process
    'process.runtime.thread_count': None,
    # File descriptors opened by this process
    'process.open_file_descriptor.count': None
}

logfire.configure()
logfire.instrument_system_metrics(system_metrics, base=None)
# logfire.instrument_system_metrics(process_metrics, base=None)

while True:
    # needed to keep the process alive
    time.sleep(60)

This is my code, you've to probably wait for a couple of minutes for the errors to start showing up.

Operating system: I'm using MacOS 14.1.1, M1 chipset
Logfire version: Tried on both 1.01 and 2.3.0

@alexmojaki
Copy link
Contributor

That only gives me KeyError: 'system.network.dropped.packets'

@satwikkansal
Copy link
Author

Yes, you're correct, I might have been instrumenting both system_metrics and process_metrics thinking they're mutually exclusive. It's just the KeyError: 'system.network.dropped.packets' error if I just call instrument_system_metrics once.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation OTel Issue
Projects
None yet
Development

No branches or pull requests

2 participants