Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: create span for mysql2 execute #1221

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

xuan-cao-swi
Copy link
Contributor

Description

Adding a span for the mysql2 execute function is similar to the pg EXEC_PREPARED_ISH_METHODS.

When using prepared statements with ActiveRecord, the first execution of a query will create a prepare span, which can log the query execution. However, re-execution will not trigger the prepare function; it will trigger the execute function directly. Since we currently don’t create a span for the execute function, I’d like to propose adding a span for the execute function.

See prepared statement execution log from mysql server:

# first time call the query
2024-11-01T19:35:33.716687Z	   12 Prepare	SELECT `customers`.* FROM `customers` WHERE `customers`.`contactLastName` = ? LIMIT ?
2024-11-01T19:35:33.716952Z	   12 Execute	SELECT `customers`.* FROM `customers` WHERE `customers`.`contactLastName` = 'Schmitt' LIMIT 1

# second time call the query
2024-11-01T19:35:38.256820Z	   12 Execute	SELECT `customers`.* FROM `customers` WHERE `customers`.`contactLastName` = 'Schmitt' LIMIT 1

# third time call the query with different binding value
2024-11-01T19:35:46.300242Z	   12 Execute	SELECT `customers`.* FROM `customers` WHERE `customers`.`contactLastName` = 'Snow' LIMIT 1

Even though there is no valuable information except the binding value for the prepared statement, it is useful for monitoring the MySQL query execution duration.

Sample span data:

otel-collector     | InstrumentationScope OpenTelemetry::Instrumentation::Mysql2 0.27.2
otel-collector     | Span #0
otel-collector     |     Trace ID       : e93599e8a955311184c69c52769adc26
otel-collector     |     Parent ID      : 949f6bc157ecbb9a
otel-collector     |     ID             : ef2baa0ac9121262
otel-collector     |     Name           : execute
otel-collector     |     Kind           : Client
otel-collector     |     TraceState     : sw=0000000000000000-01
otel-collector     |     Start time     : 2024-11-01 19:35:46.300091427 +0000 UTC
otel-collector     |     End time       : 2024-11-01 19:35:46.301052177 +0000 UTC
otel-collector     |     Status code    : Unset
otel-collector     |     Status message : 
otel-collector     | Attributes:
otel-collector     |      -> args: Str(["Snow", 1])
otel-collector     |      -> kwargs: Str({})


def _otel_execute_attributes(args, kwargs)
if config[:db_statement] == :include
{'args' => args.to_s, 'kwargs' => kwargs.to_s}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do agree that we should amend all attributes without using explicit mappings for semantic conventions especially if it opens up a risk to exposing sensitive data.

E.g. auth token or password would be exposed through telemetry collection, with a safeguards to remove them.

I do realize we allow this to happen today with our existing include statement but I am leaning more and more each day to deleting that option all together and only allowing sanitized, obfuscate, or omit in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does sanitized mean specifically removing the sensitive data (e.g. token, password, ssn, health_id, etc.), but not other binding parameters such as address?

I believe most password stored in table should be hashed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does sanitized mean specifically removing the sensitive data (e.g. token, password, ssn, health_id, etc.), but not other binding parameters such as address?

I think our current instrumentation uses the wrong vocabulary to describe what it does. Our current instrumentations use the word obfuscate, when I think we really mean is to sanitize, i.e. remove all user provided values.

I do not mean only targeting specific sensitive attributes when I use the word sanitize.

I now think of obfuscation as meaning that the entire query is hashed as opposed to just the parameters or values.

I brought up a request to change our vocabulary at a SIG meeting and the result #1194

Related to this topic, there was a recent OTEP submitted to around sensitive data redaction but I have not taken the time to digest it all just yet:

open-telemetry/oteps#255

@arielvalentin
Copy link
Collaborator

arielvalentin commented Nov 24, 2024

Copy link
Contributor

👋 This pull request has been marked as stale because it has been open with no activity. You can: comment on the issue or remove the stale label to hold stale off for a while, add the keep label to hold stale off permanently, or do nothing. If you do nothing this pull request will be closed eventually by the stale bot

@github-actions github-actions bot added the stale Marks an issue/PR stale label Dec 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Marks an issue/PR stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants