What are the Best Practices for Providing Instrumentation for Spring AI. #12878

Cirilla-zmh · 2024-12-11T11:20:56Z

Backgroud

Hi, we are currently working on providing automatic instrumentation capabilities for applications built using the Spring AI framework. Our goal is to enable users to obtain various observability data (mainly traces) without needing to modify their code after installing opentelemetry-java-instrumentation in their Spring AI application.

This sounds like a requirement for plugin support. However, we have found that Spring AI already supports a rich set of observability features, and the trace attributes adhere as closely as possible to the OTel semconv. Therefore, we believe that just exporting observability data by opentelemetry-java-instrumentation is a more elegant solution. We have made some modifications to the demo application, successfully exporting this data to Jaeger. Here is the effect:

Making some necessary adaptations in the demo application can indeed achieve this effect, but we think there might be better ways to achieve this. We have also came across a issue of memory leak, and below are some issues we are particularly concerned about.

List of Issues

Initialization of OpenTelemetry Sdk

Like other Spring applications, the underlying tracing capability of Spring AI is based on the micrometer framework, which requires adding these dependencies:

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-core</artifactId>
</dependency>
<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-tracing-bridge-otel</artifactId>
</dependency>

However, in the implementation of spring-boot-actuator-autoconfigure, it does not detect the presence of opentelemetry-java-instrumentation; instead, it checks if an object of class OpenTelemetry exists in the application context. If not, it creates a new one—this causes the OpenTelemetrySdk provided by java agent not to be used by micrometer.

@Bean
@ConditionalOnMissingBean(OpenTelemetry.class)
OpenTelemetrySdk openTelemetry(ObjectProvider<SdkTracerProvider> tracerProvider,
	ObjectProvider<ContextPropagators> propagators, ObjectProvider<SdkLoggerProvider> loggerProvider,
	ObjectProvider<SdkMeterProvider> meterProvider) {
    OpenTelemetrySdkBuilder builder = OpenTelemetrySdk.builder();
    tracerProvider.ifAvailable(builder::setTracerProvider);
    propagators.ifAvailable(builder::setPropagators);
    loggerProvider.ifAvailable(builder::setLoggerProvider);
    meterProvider.ifAvailable(builder::setMeterProvider);
    return builder.build();
}

One way to solve this issue is to explicitly declare a Bean of class OpenTelemetry in the Configuration class of application:

@Bean
public OpenTelemetry getOpenTelemetry() {
    return GlobalOpenTelemetry.get();
}

Of course, we can add some auto configuration strategies in Spring AI (or its other distro, such as Spring AI Alibaba) to handle this logic for the user. However, we believe it would be better if this behavior were managed by the opentelemetry-java-instrumentation. The framework should fully implement observability logic based on the OpenTelemetry API, and should not notice the presence of a java agent.

Potential Memory Leak

For most applications that depend on spring-actuator, micrometer generates many metrics by default, which often face high-cardinality issues (e.g., the uri of RestTemplate is recorded as a tag by default). In opentelemetry-java-instrumentation, there is an auto-instrumentation for spring-actuator, which adds a registry to micrometer. This registry seems to bypass micrometer's high cardinality control, leading to dimension explosion and memory leaks.

This seems not to be an issue because the risk of high-cardinality tags should be borne by the user. However, the current behavior is that, without the opentelemetry-java-instrumentation, micrometer's memory consumption is normal (controlled by the default configuration maximumAllowableTags=100), but with the opentelemetry-java-instrumentation, memory leaks occur, which may mislead users into thinking the opentelemetry-java-instrumentation is causing the memory leak. We are still investigating the details of this issue and would like to know if the community has encountered similar problems? (I apologize for not finding a similar issue in this project.)

About the Support Plan of Spring AI

Currently, in the OpenTelemetry Java projects (opentelemetry-java-instrumentation, opentelemetry-java and opentelemetry-java-contrib), I have seen no discussion about the Spring AI framework. In the long term, does the community plan to support this framework? Will there be a new instrumentation provided, or will it rely on library instrumentation within Spring, as we have done—despite it being based on micrometer-tracing?

Demo

To give you a little more context, I have created a repository that contains a simple Spring AI application demo.

Note that before running the demo, you need to obtain an API Key from a LLM provider, such as OpenAI or DashScope.

Additional Context

Spring AI and its Observability: https://docs.spring.io/spring-ai/reference/observability/index.html

The text was updated successfully, but these errors were encountered:

trask · 2024-12-11T16:39:41Z

cc @asaikali

Cirilla-zmh · 2024-12-12T02:14:00Z

cc @asaikali

Thanks a lot! ;) @trask

@asaikali
Looks like we have some trouble with exporting tracing data created by spring-actuator (based on micrometer) with OTel Java Agent. But we think it's valuable because of user-friendliness. Could you give some best practices or suggestions for making spring-actuator and OTel Java Agent work well together?

ThomasVitale · 2024-12-14T10:02:51Z

Thanks for reporting this issue. There might be some confusion/conflict among different strategies.

There are three main ways to handle observability in a Spring application, and they can all be used to export OpenTelemetry data. I would recommend choosing only one of these strategies to avoid conflicts/incompatibilities/errors.

Using the instrumentation available in all Spring libraries and dependencies, based on the Micrometer Observation API. Spring AI is instrumented using this strategy. And you can use the Micrometer support for OpenTelemetry to export metrics and traces. This option relies on Spring Boot Actuator as well. If you have issues with this option, I would recommend submitting an issue on the Spring GitHub projects. This option does not use the OpenTelemetry Java Instrumentation library.
Using the OpenTelemetry Spring Boot Starter, which is based on the OpenTelemetry Java Instrumentation.
Using the OpenTelemetry Java Agent, which is also based on the OpenTelemetry Java Instrumentation.

If you're using the Agent, then options 1 or 2 should not be added to avoid unpredictable results. But I would actually recommend going with either option 1 or 2.

About Spring AI, you can find a full example here, which exports OpenTelemetry logs, metrics and traces: https://github.com/ThomasVitale/llm-apps-java-spring-ai/tree/main/observability/models/observability-models-openai

Cirilla-zmh · 2024-12-16T03:10:16Z

If you're using the Agent, then options 1 or 2 should not be added to avoid unpredictable results. But I would actually recommend going with either option 1 or 2.

Sure, this solution makes sense. However, our applications have already been enhanced with opentelemetry-java-instrumentation, which means we would need to maintain two different approaches for exporting observability data and figure out how to integrate them. Otherwise, we would have to migrate the observable instrumentation solution to the OpenTelemetry Spring Boot Starter implementation, along with all of our custom developments. Either solution implies a significant cost.

As you mentioned, the application can go with either option 1 or option 2. I believe we could also find a way to make it work well with either option 1 or option 3. This might require some adjustments in the opentelemetry-java-instrumentation, spring-actuator, or both. Nonetheless, I think it's a really excellent and user-friendly solution.

About Spring AI, you can find a full example here, which exports OpenTelemetry logs, metrics and traces: https://github.com/ThomasVitale/llm-apps-java-spring-ai/tree/main/observability/models/observability-models-openai

Thanks for your perfect work! ;)

Cirilla-zmh changed the title ~~The Best Practices for Providing Instrumentation for Spring AI.~~ What is the Best Practices for Providing Instrumentation for Spring AI. Dec 11, 2024

Cirilla-zmh changed the title ~~What is the Best Practices for Providing Instrumentation for Spring AI.~~ What are the Best Practices for Providing Instrumentation for Spring AI. Dec 11, 2024

Cirilla-zmh mentioned this issue Dec 13, 2024

[Observability] Add observability support for function calling. spring-projects/spring-ai#1924

Open

stevesea mentioned this issue Dec 19, 2024

clarify recommendations for instrumenting a Spring Boot application #12934

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What are the Best Practices for Providing Instrumentation for Spring AI. #12878

What are the Best Practices for Providing Instrumentation for Spring AI. #12878

Cirilla-zmh commented Dec 11, 2024 •

edited

Loading

trask commented Dec 11, 2024

Cirilla-zmh commented Dec 12, 2024

ThomasVitale commented Dec 14, 2024

Cirilla-zmh commented Dec 16, 2024 •

edited

Loading

What are the Best Practices for Providing Instrumentation for Spring AI. #12878

What are the Best Practices for Providing Instrumentation for Spring AI. #12878

Comments

Cirilla-zmh commented Dec 11, 2024 • edited Loading

Backgroud

List of Issues

Initialization of OpenTelemetry Sdk

Potential Memory Leak

About the Support Plan of Spring AI

Demo

Additional Context

trask commented Dec 11, 2024

Cirilla-zmh commented Dec 12, 2024

ThomasVitale commented Dec 14, 2024

Cirilla-zmh commented Dec 16, 2024 • edited Loading

Cirilla-zmh commented Dec 11, 2024 •

edited

Loading

Cirilla-zmh commented Dec 16, 2024 •

edited

Loading