Semantic Conventions - Multi-Registry Proposal #348

lquerel · 2024-08-29T23:04:54Z

This PR proposes to support multiple semantic convention registries in OTEL and Weaver.

If you’re like me and prefer to read a rendered version of the markdown spec, it’s available here.

Note: This proposal could eventually be transformed into an OTEP if needed.

See GH issue #215

…registry-spec

lmolkova · 2024-09-03T22:40:47Z

docs/specs/multi-registry/multi_registry.md

+  the needs of the owner. Registries should be accessible via a URL.
+- **Common Format**: A published registry should adhere to a common packaging and format, making it easy to
+  consume and integrate with other registries.
+- **Self-Contained**: A published registry should be self-contained and stored in a single file.


agree with-self-contained, but single file sounds weird as a design principle

The idea was to make a published registry simple to consume, so in addition to being self-contained, having a single file also simplifies consumption. I’m going to remove the “single file” aspect from the design principles section for now, but I will continue to use the single file approach for the published registry in this proposal and see how things evolve.

lmolkova · 2024-09-03T22:47:10Z

docs/specs/multi-registry/multi_registry.md

+  These policies should be enforced by Weaver.
+- **Cross-Registry References**: References between different registries should be supported, facilitating
+  interoperability and integration across various registries.
+- **Conflict Avoidance through Scoping**: A scoping mechanism should be implemented to ensure that a signal


I'd like to understand what scoping means.

There should not be a case when two attributes with the same name are defined within one schema_url.

If scoping means that both exist but you can specify which one is imported in yaml than this information is not available to the consumer of this telemetry.

Even worse if you can conditionally import one or another in different parts of conventions

To make things clear, the consumer of telemetry, i.e., anyone downstream from signal production (backend, dashboard, etc.), will not see this type of conflict and will see the attribute or signal with its name as defined today. I will add this principle to the design principles.

I am currently writing an entire section on conflict resolution. Conflicts can arise for several reasons. Examples:

A registry that imports two registries, which in turn import a registry but in different versions.

A registry that imports two registries, which in turn import the same registry but apply different overrides.

Things should be better defined soon.

I added the following item in the design principles list:

Transparency for Telemetry Consumers: The downstream consumers of telemetry data, such as backends and dashboards, should never be exposed to conflicts within or between registries. They should see attributes and signals as they are defined, without any scope or conflict resolution directives, ensuring a consistent and reliable data experience.

lmolkova · 2024-09-03T22:53:08Z

docs/specs/multi-registry/multi_registry.md

+- Short answer: No for backward compatibility.
+- Long answer: Experimental entities are not meant to be used in production. Experimental entities are subject to
+  change or removal without any notice in the next version of the registry. The type of an experimental entity can
+  be changed. Weaver will detect and report an error if an experimental entity is referenced across registries.


I don't understand this limitation or reasoning behind it.

It's common to use experimental semconv in production, it's also common to implement them in OTel or outside it in the instrumentations.

It makes sense to require something that depends on experimental stuff to be experimental, but I don't understand why we should limit the ability to import experimental attribute/group.

The approach was to support only the most essential/simple features at the beginning, with the option of adding such things later if the need arose. It seems that the need is already here :-)

Technically, I don’t think there’s a problem with supporting experimental definitions across registries. Even if we decide that this is supported by default, I believe we should, at a minimum, make it easier for users who don’t want to allow this. This could simply be implemented as an OTEL policy that prevents this type of dependency.

lmolkova · 2024-09-03T22:54:57Z

docs/specs/multi-registry/multi_registry.md

+  error if a reference is made to an entity that is not defined in the imported registry.
+
+### Open Questions:
+- Can we override any field of a group defined in an imported registry? No for `type` field, what about the other


I'd start with the same limitations as within the registry. You can't change type, stability, deprecation status. Can change anything else.

lmolkova · 2024-09-03T22:56:52Z

docs/specs/multi-registry/multi_registry.md

+### Open Questions:
+- Can we override any field of a group defined in an imported registry? No for `type` field, what about the other
+  fields?
+- Is there a relationship to define between the instrumentation scope name and version and the registry?


instrumentation scope is a random string - name of the library/class/component that produced the telemetry, I can change which component/class emits my logs/events, but this should have no effect on the consumer of the telemetry.

I think we might change it at some point, but that's the current state of affairs.

lmolkova · 2024-09-04T00:43:25Z

docs/specs/multi-registry/multi_registry.md

+
+### Spans
+- No span name collisions
+- Spans cannot be removed


why not? what does it even mean to remove a span?

I assume I pick the spans that I want to support by referencing them and I should be free to drop any spans that I don't need.

Moreover, if I'm defining semconv for a specific system (e.g. Kafka library), not all spans declared in generic conventions apply to my library, some of them are defined for scenarios that I don't support and it'd be best if I don't ever mention corresponding spans in my conventions.

In the OTEL registry, once a span has been defined, can its definition be removed in a later version? I thought this wasn't allowed if the span had a stability status set to stable. If that's the case, I think we should apply the same rule for all registries, whether OTEL or not.

Let's say I work on a library that supports multiple protocols (HTTP and something else). At some point my library removes support for one of these protocols. I should be allowed to stop referencing corresponding protocol spans - I no longer report them.

For metrics we at least support informal requirement levels and we could say that if you report e.g. http metrics, you must at least report required ones. Even this is tough to say - how do you know that they are going to report HTTP metrics if they don't reference them?

For spans it's even more complicated since span identity is not defined and spans don't have requirement levels.
We don't define anything about spans in yaml beyond what's defined for the attribute_groups - we need to change this first.

I think there are two separate discussions here:

What's allowed when I reference conventions from registry A in registry B (or within one registry).

What changes are allowed in the the registry B from version N to N+1

I'm mostly talking about p1 (you can reference or not reference any group), and I think you're talking about p2.

Maybe it'd be more clear if policies were separated into two groups: within-one-version and backward-compatibility?

lmolkova · 2024-09-04T00:45:51Z

docs/specs/multi-registry/multi_registry.md

+- Event names must match the following regular expression: `^[a-z][a-z0-9]*([._][a-z0-9]+)*$`
+
+### Spans
+- No span name collisions


I think we need to find a better identity parameter for spans than name. Span names is an existing thing - they are usually something like "GET /users/{user-id}" or SELECT public.my-shop-db - they are human-friendly summary of what this span represents and are part of the exported telemetry.

I agree. However, for symmetry with other signal types, we should ideally keep name to identify things, as we do with attributes and metrics, and use a separate identifier for the human-friendly summary of what this span represents.

Span name is used everywhere - in the spec - https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/api.md#span-creation, in all semantic conventions. I don't think we can change how span name property is called - it'll be a huge source of confusion.

lmolkova · 2024-09-04T00:46:22Z

docs/specs/multi-registry/multi_registry.md

+- Spans cannot be removed
+- Spans cannot "degrade" in stability (e.g., stable -> experimental)
+- The set of required/recommended attributes must remain the same
+- Span names must match the following regular expression: `^[a-z][a-z0-9]*([._][a-z0-9]+)*$`


see my comment above - span name is pre-existing thing with different rules (unicode string)

lmolkova · 2024-09-04T00:55:20Z

docs/specs/multi-registry/multi_registry.md

+## Resolved Semantic Convention Registry Format
+
+> Note: A resolved registry is self-contained and does not include any complex constructs
+> like `imports`, `ref`, `extends`, etc. Their **structure is less subject to change**, making them good


if I'm defining a span or a metric, I have to reference some attribute and maybe extend some attribute group, so this may need clarification.

lmolkova · 2024-09-04T00:56:36Z

docs/specs/multi-registry/multi_registry.md

+  - Accessible via a URL.
+  - Self-contained, i.e. a single file.
+  - No `ref`, no `extends`, no `imports`, no alias, no other complex constructs.
+  - Yaml or JSON format so resolved registries can be easily consumed by any tool.


nitpicking: I think single file and this point are mutually exclusive - I get some warnings from VSCode on resolved attribute registry alone due to its size.

lmolkova · 2024-09-04T00:58:54Z

docs/specs/multi-registry/multi_registry.md

+Open Questions:
+
+- Do we keep track of the imported registries in the resolved registry? If yes, how? Lineage?
+- Can we leverage the attribute deduplication mechanism to simplify the merging of imported registries? ToDo -> Explain


can we start by disallowing attribute/span/event/metric duplication and only add it along with scoping only if there is actual demand?

The next version of this document includes a complete revision of the resolution process and eliminates the need for scoping in the most common cases. I think the best approach at this stage is to postpone the discussion on scoping.

lmolkova · 2024-09-04T01:00:17Z

docs/specs/multi-registry/multi_registry.md

+
+- What about introducing a new type of semconv file that will let end-users define global overrides and global redact
+  directives? For example, the requirement level of attributes such as `client.address` or `server.address` will be
+  better defined by the end-user than by the library author, or a vendor. A similar approach could be used for redact


future concern: we should be crystal clear that user modifying requirement level in their registry has no effect on instrumentation libraries or produced telemetry.

Yes, in a general context. I will add this point to the document. However, I also believe that Weaver can enable the development of a new type of type-safe Client SDK, where a change in the requirement_level would have a direct impact on what is reported by the applications using this SDK. Personally, I am a strong believer in this approach.

I love the idea, but it needs all instrumentation libraries to change.

I.e. if my library reports http.request.header.foo as opt-in (or doesn't report it at all), it's not in the OTel SDK power to make my instrumentation report it.

I don’t think we need to change anything in the current instrumentation libraries. In my view, we can follow a two-step approach. The first step involves proposing a library that offers a type-safe API on top of the existing SDK clients. Those who prefer the current approach can continue using the current SDK client as they always have. Those who want the pure type-safe API approach will only interact with this additional layer. It is even conceivable that a user might want to adopt a mixed approach, using one or the other APIs depending on the need.

In step 2, the idea is to no longer base the type-safe API on the SDK clients, but to provide a fully optimized version by taking advantage of the fact that a type-safe API doesn’t need all the overhead of a generic approach. Hashmaps, abstraction layers, etc., can either be optimized or completely removed.

The advantage of this two-step approach is that we can test the type-safe API approach with the community at a lower cost. I already have a proof of concept for step 1 in Rust.

lmolkova · 2024-09-04T01:07:32Z

docs/specs/multi-registry/registries/acme-http-server-lib/metric.yaml

+    name: otel
+
+groups:
+  - ref: otel:http.server.request.duration


I think that reusing the same name across different signals is allowed and is a feature rather than a bug (e.g. dns.lookup.duration is a metric name, but if DNS lookup is reported as an event, it'd make a perfect event name, or if it's reported as attribute on something, it could be an attribute name).

So we need ref: to either reference a group id (and then give them some meaning) or reference a specific signal (ref_metric).

I think this is not something currently allowed or even supported. You’re mentioning in this comment something you’d like us to explore. Is that correct?

I'm talking about current state.

I can easily define an attribute with the same name as metric, there are no checks and nothing is going to stop me. Nothing would break if I do. And I consider it a feature, not a bug.

My bad. It's now fixed. See 5d68d7d

lmolkova

Thanks for the detailed spec!

I really like the direction it goes into, here are some suggestions:

we need to define what it means to reference a group (metric/span/event) within semantic conventions - there are a lot of small questions I left on this, but I think this discussion needs to happen in semconv before anything is done in weaver to support it. We should start by supporting it within the same registry.
The current scope of this proposal is huge and while I agree it's important to design and prototype how the eventual thing would look like, I'd suggest to start with attributes and attribute registry for the initial implementation once we have a basic idea how it'd work for groups.

…registry-spec

…of group

…registry-spec

…atest

jsuereth · 2024-09-10T18:48:37Z

docs/specs/multi-registry/multi_registry.md

+> [!NOTE]
+> In this document, “semantic convention registry” refers to a collection of semantic convention entities (attributes,
+> groups, signals, etc.) that define the semantics of the data model used in OpenTelemetry. The terms “registry” and
+> “semantic convention registry” are used interchangeably. The term “entity” refers to any semantic convention


Nit: "Entity" may have overlap in name in OpenTelemetry. I can't think of a better term just yet, but we may need one.

I understand. Should I use telemetry object instead of entity?

jsuereth · 2024-09-10T18:53:39Z

docs/specs/multi-registry/multi_registry.md

+```yaml
+name: <registry_name>
+description: <registry_description>
+version: <registry_version>


One concern I have here is the same I have, e.g. with cargo crate versions.

What does CI/CD look like? My ideal is releasing a version is as close to git tag <version> as possible.

This is a bit of a nit, but also a foundational question - What does release and maintenance of these look like? Should version be external, but everything else be in this file?

Here’s the English version of your text:

Using git tag <version> and not defining the version number in the YAML file is fine with me, as long as we can maintain the self-contained aspect of the artifact that will be downloadable by the users of the corresponding registry. This artifact could be hosted on a GitHub repo, a simple web server, or a CDN. In the end, Weaver should be able to retrieve the version either from the content, the file name, headers, etc. So, I feel like we’ll need a bit more than just a git tag <version>. Any suggestions to achieve something like this are welcome.

Yes - I'm fine if git tag <version> kicks off a workflow that fills out this version, just want to make sure we keep it flexible so we can do that. I agree there needs to be a final file that has the version filled out. I just want to make sure that's a "produced" artifact not an "input source" requiement.

jsuereth · 2024-09-10T18:55:06Z

docs/specs/multi-registry/multi_registry.md

+is used in these processes.
+
+Open Questions:
+- Should we follow SemVer 2 for registry versions? It seems advisable, as Weaver can detect breaking changes. However,


This is something we likely need to discuss on the specification itself. cc @tigrannajaryan.

I do think we should just adopt semconv as it alleviates a lot of problems with a path forward for handling them, all while keeping the versioning "simple".

I don't think that we should force semconv on all repositories. I think it's a good thing to follow for the otel repositories but might not be appropriate everywhere.

I think automatically upgrading to newer versions might be a bad idea in general. If I pinned to a specific version getting a newer version will eventually lead to unexpected results.

Sorry, I meant semver

@jsuereth I agree. We should use SemVer for OTel but not strictly adhere to all semantic versioning rules. We could simply define an order relation between versions (e.g., 1.2.0 > 1.1.15 > 0.9.32), which would be sufficient to determine if one registry is newer than another.

@MadVikingGod Regarding the automatic upgrade, to clarify, I’m not suggesting automatically upgrading registry versions. Instead, I’m proposing a command that allows the user to explicitly choose whether to automatically upgrade registry versions, so it’s done in a controlled manner.

jsuereth · 2024-09-10T18:56:49Z

docs/specs/multi-registry/multi_registry.md

+`attributes` section of a group can reference an imported attribute.
+
+In the current semantic conventions specification, referencing a group is currently unsupported. Uniqueness within
+groups is scoped by the type of group. It is entirely possible to have an event and a metric identified by the same ID.


Maybe we should fix this first - This is something we could do BEFORE allowing multiple registries and I think would make our lives better overall.

cc @lmolkova for thoughts on that.

@jsuereth I think it's a feature, not a bug - #348 (comment)

E.g. http.request.body.size can in theory be an attribute and a metric name. Or messaging.message.time_in_queue.

I don't believe we have real examples of this but there were some discussions in the past where it would make sense to use the same attribute name on an event/span as some existing metric name.

It could be a good feature for spans/events->metrics pipeline (take specific attribute and convert it to metric) and I'd like to check if we can keep this door open.

docs/specs/multi-registry/multi_registry.md

jmacd · 2024-09-12T22:18:36Z

I don't feel qualified to approve this, but the motivation and vision look good to me. Thanks @lquerel!

lquerel added 7 commits August 29, 2024 11:07

feat(multi-registry): Create first draft of the spec.

118e5f3

feat(multi-registry): Update draft of multi-registry spec.

dba1a2c

feat(multi-registry): Update draft of multi-registry spec.

6d08430

feat(multi-registry): Update draft of multi-registry spec.

f075e43

feat(multi-registry): Update draft of multi-registry spec.

7aa9129

feat(multi-registry): Update draft of multi-registry spec.

e2bd4e0

feat(multi-registry): Update draft of multi-registry spec.

97ed998

lquerel requested review from jsuereth, lmolkova and MadVikingGod August 30, 2024 00:23

lquerel self-assigned this Aug 30, 2024

lquerel added documentation Improvements or additions to documentation enhancement New feature or request labels Aug 30, 2024

lquerel mentioned this pull request Aug 30, 2024

Support multiple semantic convention registries #215

Open

lquerel added 11 commits August 30, 2024 11:23

feat(multi-registry): Update draft of multi-registry spec.

2050cc7

feat(multi-registry): Update draft of multi-registry spec.

17f5b68

feat(multi-registry): Update draft of multi-registry spec.

5cef1e2

feat(multi-registry): Update draft of multi-registry spec.

e20011d

feat(multi-registry): Update draft of multi-registry spec.

e963180

feat(multi-registry): Update draft of multi-registry spec.

02e58eb

Merge branch 'main' into multi-registry-spec

2f04a35

Update multi_registry.md

0a698de

feat(multi-registry): Update draft of multi-registry spec.

fe45217

Merge remote-tracking branch 'origin/multi-registry-spec' into multi-…

d510fe3

…registry-spec

feat(multi-registry): Update draft of multi-registry spec.

5562cef

lmolkova reviewed Sep 3, 2024

View reviewed changes

lmolkova reviewed Sep 4, 2024

View reviewed changes

lquerel added 4 commits September 6, 2024 20:10

feat(multi-registry): Update draft of multi-registry spec.

706f7dc

Merge branch 'main' into multi-registry-spec

b518c17

feat(multi-registry): Update draft of multi-registry spec.

422022f

Merge remote-tracking branch 'origin/multi-registry-spec' into multi-…

d5ab29d

…registry-spec

lquerel changed the title ~~[WIP] Multi-Registry - Draft Proposal~~ Semantic Conventions - Multi-Registry Proposal Sep 7, 2024

lquerel marked this pull request as ready for review September 7, 2024 04:31

lquerel requested a review from a team September 7, 2024 04:31

lquerel added 5 commits September 8, 2024 09:27

Merge branch 'main' into multi-registry-spec

165f484

Merge branch 'main' into multi-registry-spec

21b5a5b

feat(multi-registry): uniqueness within groups is scoped by the type …

5d68d7d

…of group

Merge remote-tracking branch 'origin/multi-registry-spec' into multi-…

acb6b7b

…registry-spec

feat(multi-registry): dependency version can be a version number or l…

3d625ad

…atest

jsuereth reviewed Sep 10, 2024

View reviewed changes

docs/specs/multi-registry/multi_registry.md Show resolved Hide resolved

chore(spec): Explain how Weaver will help resolving conflicts

58d4f9d

Merge branch 'main' into multi-registry-spec

f78e5a2

lquerel requested a review from a team as a code owner October 1, 2024 15:20

Semantic Conventions - Multi-Registry Proposal #348

Are you sure you want to change the base?

Semantic Conventions - Multi-Registry Proposal #348

Conversation

lquerel commented Aug 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lmolkova Sep 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lmolkova Sep 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lmolkova Sep 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lmolkova Sep 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lmolkova left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmacd commented Sep 12, 2024

lquerel commented Aug 29, 2024 •

edited

Loading

lmolkova Sep 3, 2024 •

edited

Loading

lmolkova Sep 8, 2024 •

edited

Loading

lmolkova Sep 8, 2024 •

edited

Loading

lmolkova Sep 8, 2024 •

edited

Loading