Refactor Probe API #1307

damemi · 2024-11-22T22:08:30Z

To support custom (modular) Probes, we want a Probe API that is:

Generic - It should be rich enough to support our current use cases, but not too focused to make new use cases harder to add. Therefore...
Extensible - We don't know every use case we want to support yet, so our current use cases should not be tightly bound into the Probe API so as to allow future extensions. Therefore...
Flexible - New use cases and changes to old use cases shouldn't cause widespread damage. High-composition functionality should be contained.

All in all, this came out less heavy than I expected. But there is still a lot of nuance here, especially around making sure the interface design truly is flexible enough for us to expand in the future. As always, I'm open to any feedback on this.

Summary

In this design, I am proposing modifying our Probe API to rely on a layered composition of interfaces to define Probes. Today, this gives the appearance of simple inheritance. However, as more diverse functionality support is added in the future, I believe this will show more truly as a composition of base features.

To enable this composition, I am proposing adding extension points into our existing framework, gated by interface checks. Again, this is not very compelling with our single use case today but sets us up for adding new extension points in the future.

Current Design Problems

Our current, single Probe interface is designed with the assumption that all Probes will be Loaded, Run, and produce a Go Library Manifest.

Note that this does not include any indication that our current Probes will produce any telemetry. It also means that any new Probes that aren’t based on a Go program, or a specific target, or that perhaps don’t need to be triggered to Run continuously must implement superfluous functions.

Instead, we have achieved polymorphism through structs: Base, SpanProducer, and TraceProducer. While this has worked well for our different needs specific to Go library trace-producing Probes, these structs are still bound to a single interface definition which limits our ability to add new functionality and Probe types as time goes on.

Proposal

Interfaces

To start, I broke down the functionality of our current Probes into layers. Each Probe we have today:

Loads and Attaches itself to a trigger point
Runs and Closes
Produces Traces
Binds itself to a single Go library

From these points I am proposing the initial interfaces:

BaseProbe - Must provide ID, Logging, Loading/Attaching, and Config functions. This are the base requirements for all Probes.
RunnableProbe - Implements Run and Close functions.
TracingProbe - Accepts Tracing config, such as Sampling.
GoLibraryTelemetryProbe - Provides a Go Manifest and Target Executable config. This is what our current Probes are, but note that this only embeds a RunnableProbe and not a TracingProbe to allow us to implement future telemetry interfaces (such as MetricsProbe).

My goal with this is to provide different “branching-off” points for new types of Probes in the future. For example, we may have a RunnableProbe that relies on kprobes or network events.

The list of interfaces we define will be baked into the framework and outline the functionality that is available in the framework. End-user developers will not be able to define their own interfaces, as that wouldn’t make sense (because they can’t customize the framework itself). However, users will be able to create their own implementations.

Default implementations

For each of these new Probe interfaces, I’m providing a default implementation:

BasicProbe - Simple (partial) implementation of BaseProbe, but doesn’t have Load or Attach functionality.
TargetExecutableProbe - Implementation of GoLibraryTelemetryProbe that provides Load, Attach, Target Config, and Manifest functions.
TargetEventProducingProbe - Generic struct that provides a read() helper for our 2 types of telemetry Probes.
TargetSpanProducingProbe and TargetTraceProducingProbe - Both RunnableProbes that are the equivalent of our current Probe offerings.

Refactors

The main refactor in this was the addition of several interface checks in the Manager: for example, we only call Run() on a RunnableProbe, and we only check for the Go Manifest() on a GoLibraryTelemetryProbe.

As part of accepting the new interfaces, some helpers were consolidated (for example, now FilterUnusedProbes is FilterUnusedProbesForTarget, with target info passed to the relevant Probes in order to simplify functions like Load()).

On that note, Load() has been simplified for the purpose of abstracting BasicProbe. The target executable information and Sampling config are now provided by the manager, if appropriate for the type of Probe.

New features

As suggested in #1105 (comment), Load() has been split into Load() and Attach(). This function also no longer relies on the Sampling config, allowing it to be used for non-tracing Probes in the future. (Instead, Sampling config should be provided by calling probe.TracingConfig().SamplingConfig on a TracingProbe).

All Probes are also now expected to implement a custom ApplyConfig() function, which accepts a generic interface{} as an argument. The Probe can then attempt to convert the interface{} to its own unique config format for Probe-specific settings (see an example in the net/http probe).

MrAlias

I'm generally not in favor of this design.

It has added interface complexity for the theoretical possibility of future additions.
- Deeply nesting these interface types will make this code harder to read and maintain. Instead we should strive for a flatter structure and use composition to achieve the desired functionality. Ideally our interfaces will be small and focused. For example, we could have a Probe interface:
```
type Probe interface {
    // Base probe functionality...
}
```
  And then have a RunCloser interface:
```
type RunCloser interface {
    Run()
    Close()
}
```
  Though we need to motivate why a Probe would never need to have a Run/Close method.
- Use-cases of probes we plan to implement would help motivate this change. It would also help us define our interfaces. I think it is ambitious to want and provide a general probe running platform, but at some point we will just be providing an alternate API for the Cilium project. We need to make sure our efforts are targeted at providing auto-instrumentation.
This does not address the multi-process issue from Auto Instrumentation - Monitor multiple processes #197. If we plan to release this design to end-users we need to ensure this will be supported. Otherwise, we will ship an immediately outdated design as we try to progress.
The load and targeting mechanisms are set by assignment to returned values.
File names of the probe package a based on Go declarations not the functionality and types they contain. This will make it harder to find code and is not idiomatic with the rest of the project.
The utils package is separate from the probes, specifically InitializeEBPFCollection. All probes will need to maintain similar BPF filesystem utilities, and much of this will be duplicative or conflicting if we cannot somehow incorporate it into the probe factories or types.

MrAlias · 2024-11-23T00:05:12Z

internal/pkg/instrumentation/probe/interfaces.go

+	ID() ID
+
+	// GetLogger returns an *slog.Logger for this Probe.
+	GetLogger() *slog.Logger


Isn't this the opposite of what we want in that we want the probes to use the instrumentation logger? When would we need to have the logger from the probe returned?

Yeah good point, this is unnecessary

MrAlias · 2024-11-23T00:06:53Z

internal/pkg/instrumentation/manager.go

+			}
+
+			// If Probe is used, pass target details to Probe
+			p.TargetConfig().TargetDetails = target


This is setting a value on a returned type from TargetConfig. This is not idiomatic of Go, parameters should be passed to function calls not set on returned values.

fair enough, my point here was just to pass TargetDetails somehow in the config for the probe rather than as a one-time argument, since the target is available at startup and we re-read it a couple times. But, this does codify the 1:1 probe:target ratio so maybe not a good idea

MrAlias · 2024-11-23T00:08:17Z

internal/pkg/instrumentation/probe/interfaces.go

+
+	// Load loads the eBPF programs and maps required by the Probe into memory.
+	// The specific types of programs and maps are implemented by the Probe.
+	Load() error


How do you specify what to load?

How will this handle multiple load targets when we support #197

I wanted to see if I could get rid of all the arguments to Load() and Attach() to make them as generic as possible. To do that, I moved loading options into the Probe's object. But, I see now that doing so actually binds us to 1 process per 1 probe, which isn't what we want.

MrAlias · 2024-11-23T00:09:55Z

internal/pkg/instrumentation/probe/interfaces.go

+// BaseProbe is the most basic type of Probe. It must support configuration,
+// loading and attaching eBPF programs, running, and closing.
+// All Probes must implement this Base functionality.
+type BaseProbe interface {


When do we expect this interface to be satisfied but not the others?

This seems like an over generalization for a project designed to produce auto-instrumentation. Shouldn't every probe produce telemetry? When they don't why would they use this project?

Every Probe should enable telemetry, but not every Probe will actually produce events. We've already talked about having Probes that don't report telemetry starting in #1105 (comment) (ie, ReportingProbe and JobProbe / SyncProbe and AsyncProbe) when we were looking at setting the boolean for the auto sdk.

We are also now looking at a set of 3 Probes for @grcevski's new approach to context propagation: one to track socket connections, one to allocate header space, and one to insert the new header. In this case, only the 3rd probe actually handles a telemetry event (as I understand it).

So, I don't think we should assume that each Probe object is necessarily going to be emitting telemetry, or we design a way to couple enabling eBPF programs to a telemetry-producing Probe. There's pros and cons to that as well

MrAlias · 2024-11-23T00:10:29Z

internal/pkg/instrumentation/probe/interfaces.go

+	BaseProbe
+
+	// Run starts the Probe.
+	Run()


How does this produce telemetry? There is no destination for that data being communicated to the probe.

I moved the handler function to a field on the Probe object. Unlike the target details, I think this does make sense unless we want to change handler functions for a Probe during runtime.

MrAlias · 2024-11-23T00:11:59Z

internal/pkg/instrumentation/manager.go

+		default:
+			return nil, fmt.Errorf("unknown probe type")


It seems like this only supports GoLibraryTelemetryProbe. Why not just have that probe defined and add new ones when we support those?

We could totally do that. But, in this proposal I tried to point out other areas where for example a Probe only needs to be a RunnableProbe or where we'd specifically want to pass tracing config to a TracingProbe (ie, not a Metrics or Logs probe)

I get that this switch statement looks dumb, and I should have been more explicit in what I was trying to show with it

RonFed · 2024-11-23T11:46:02Z

internal/pkg/instrumentation/probe/interfaces.go

+	BaseProbe
+
+	// Run starts the Probe.
+	Run()


Do we want to pass context here?
Do we want to commit to the current approach that Run is blocking until canceled?

I think a context makes sense here

RonFed · 2024-11-23T11:51:02Z

internal/pkg/instrumentation/probe/interfaces.go

+
+	// Attach attaches the eBPF programs to trigger points in the process.
+	// The specific attachment points are implemented by the Probe.
+	Attach() error


I think we need to support a use case of Loading the eBPF program once and attaching it to multiple locations in runtime.
For that reason, I think the Attach function should probably have an argument specifying where to attach to.

Good point, do you think Load needs any run time arguments too?

RonFed · 2024-11-23T11:53:28Z

internal/pkg/instrumentation/probe/interfaces.go

+
+	// Manifest returns the Probe's instrumentation Manifest. This includes all
+	// the information about any packages the Probe instruments.
+	Manifest() Manifest


I was thinking about adding version support to the Manfiest (#1105 (comment))
Do you think that it will fit here or should the Manfiest be a part of the more basic interfaces?

Right now Manifest is pretty Go-specific. I think version support could be added to the Manifest interface separate of the Probe refactor, but if we want Manifest to be part of a more basic interface then we will commit to the idea that each Probe is tied to an application library instrumentation (see #1307 (comment))

RonFed · 2024-11-23T11:55:45Z

internal/pkg/instrumentation/probe/types.go

+}
+
+// TargetExecutableProbe is a provided implementation of GoLibraryTelemetryProbe.
+type TargetExecutableProbe[BPFObj any] struct {


I wonder if we need the generic type here - as it seems we don't make use of it?

RonFed · 2024-11-23T12:01:05Z

I agree with most of @MrAlias points, except the multi-process one which I addressed in #1219 (comment)

damemi · 2024-11-25T15:59:31Z

@MrAlias @RonFed this is great feedback, thanks guys. My goal was to just get something concrete down to see how it actually looked, and it's helpful to hear what works and what doesn't from that.

Instead we should strive for a flatter structure and use composition to achieve the desired functionality. Ideally our interfaces will be small and focused

@MrAlias that's actually exactly what I was going for, but I missed the mark by embedding the interfaces. I think a composition design gives us the most flexibility in the framework. But, like you said in #1307 (comment) we could go the whole other direction and only support one very specific Probe right now, adding others as we find them.

Use-cases of probes we plan to implement would help motivate this change. It would also help us define our interfaces. I think it is ambitious to want and provide a general probe running platform, but at some point we will just be providing an alternate API for the Cilium project. We need to make sure our efforts are targeted at providing auto-instrumentation.

I had the exact same thought about this just being a wrapper on the Cilium API, lol. Obviously we don't want that so keeping the goals in mind is important.

I did actually have specific use cases that we have discussed in mind while I was doing this, so maybe that helps explain some of the weirder code:

Different signal types (logs, metrics, profiles)
- Implication: We can't assume each Probe produces traces (so hardcoding ptrace.ScopeSpans into the Run() function wouldn't work for all Probes)
Different program types (network signals, XDP, socket, host, kprobe, etc)
- Implication: Don't assume all Probes are using Uprobes to generate events based on a single library (this is why I separated TargetExecutableProbe and GoLibraryProbe, and made Uprobes an implementation detail of the structs)
Different language probes (Rust, C++, etc)
- Implication: Don’t assume Go library fundamentals (like mod/package) in Probes
Non-telemetry producing probes: ReportingProbe/JobProbe from Custom Probes #1105 (comment), or Probes that enable telemetry. This toes the line of over-abstraction, but it is something we have already discussed (and plan to do for network-based context propagation, see Refactor Probe API #1307 (comment))
- Implication: Separate RunnableProbe, as non-reporting Probes shouldn’t need a Run() function with a handler. Though, they will probably need Load() and Attach(). But, this also bakes in the assumption that a Run() function is specifically meant to handle telemetry.

This does not address the multi-process issue

That’s a good point, and I think I actually made it worse by moving Target and Executable to fields on the Probe object (thus tying 1 probe to 1 process).

For multi-process support, I agree with @RonFed that it should be handled at a higher level in the framework (Manager or Instrumentor) rather than a single Probe object knowing that it’s tied to multiple processes. Maybe that’s what you’re saying too, or I’m misunderstanding. But either way, what I did here by coupling target executable to a Probe is definitely not what we want.

The load and targeting mechanisms are set by assignment to returned values.

File names of the probe package a based on Go declarations not the functionality and types they contain. This will make it harder to find code and is not idiomatic with the rest of the project.

The utils package is separate from the probes, specifically InitializeEBPFCollection. All probes will need to maintain similar BPF filesystem utilities, and much of this will be duplicative or conflicting if we cannot somehow incorporate it into the probe factories or types.

This is helpful feedback for the implementation, but as for the fundamental design approach I’m proposing does this relate to the changes? Just want to make sure I understand where all the feedback falls.

damemi · 2024-11-27T00:52:15Z

Notes from the SIG meeting today:

@grcevski's proposal in bpf_probe_write_user helper function is locked down since linux kernel 5.14-rc6 #290 (comment) is probably the best example we have of a different type of Probe (specifically, non-telemetry producing probes)
That proposal also benefits from Probes being re-usable singletons as dependencies across other Probes (for example, to set up ebpf maps).
This relates to multi-process support as we want a Probe to support multiple processes, but also to support other Probes. Multi process in general was something I really overlooked in this PR, and needs much more consideration. We discussed Beyla's approach to a target process pipeline channel to pass to Probes at runtime.
Tagging off of that, we briefly discussed that the Manager's role is actually pretty minimal today (basically just loading and unloading Probes)
The Probe name is overloaded and doesn't describe what they do very well. We proposed Instrument (for telemetry-producing/instrumenting probes) and Daemon (for supporting/running/non-telemetry probes).
- Side note- both of these are running probes, and we haven't identified a case for a non-running probe (didn't discuss that in the meeting but it was asked in this thread)

Let me know if I missed anything. I'll try to incorporate some of the feedback from this thread and the meeting into this PR where I can, if anything just to see how it looks. By all means anyone else feel free to open an alternative or more comments for changes you'd like to see to this PR. Thanks!

damemi · 2024-12-03T18:57:40Z

SIG meeting notes:

Multi-process support needs to be defined as to what the interface provides, without constraining to one way or another (ie, Probes aren't necessarily required to support multi-process).
- Maybe multi-process doesn't need to be a deciding feature at this point, could be hooked up outside of the probe
- Think more generically about what ebpf use cases we want to support now
- Multi-process could have performance/noisy neighbor issues. Implementors should be able to decide if they want to provide it
Composition of modular interfaces vs specialized interfaces -- we could just have one very detailed interface now and add more later if we want, or we can start with a generic base interface and build with smaller, modular interfaces on top of it (this PR)

RonFed · 2024-12-04T07:43:42Z

Adding to @damemi comment.
I think that sharing an instance of a probe across processes makes more sense for probes that are not uprobes.
Since today all of our probes are uprobe-based, I think having the ability to share an instance across processes is important for future use cases but may not be critical based on the current probes. As Mike pointed out and we talked about in the SIG, sharing an instance for uprobes might be problematic in high throughput scenarios. In addition, I think conceptually uprobes are a per-process entity so the current model does make more sense for them.

Regarding implementing the non-uprobes probes: I think that a singelton entity (KernelProbeManager?) can be introduced that will be common to all Instrumentations and will be responsible for handling all the common probes.

damemi added 7 commits November 22, 2024 16:14

no-op: move consts to their own file

3316549

Move Uprobe type to its own file

85b29e8

Delete old instrumentation/probe/probe.go

9de77e6

Add new Probe interfaces

5ce6a73

Add default Probe implementations and convenience builders

2b8e5a7

Update Manager with new extension points (using new interfaces)

0555c93

Switch current Probes to new interfaces

1826b46

damemi requested a review from a team as a code owner November 22, 2024 22:08

MrAlias reviewed Nov 23, 2024

View reviewed changes

RonFed reviewed Nov 23, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor Probe API #1307

Refactor Probe API #1307

damemi commented Nov 22, 2024

MrAlias left a comment

MrAlias Nov 23, 2024

damemi Nov 25, 2024

MrAlias Nov 23, 2024

damemi Nov 25, 2024

MrAlias Nov 23, 2024

damemi Nov 25, 2024

MrAlias Nov 23, 2024

damemi Nov 25, 2024

MrAlias Nov 23, 2024

damemi Nov 25, 2024 •

edited

Loading

MrAlias Nov 23, 2024

damemi Nov 25, 2024

RonFed Nov 23, 2024

damemi Nov 25, 2024

RonFed Nov 23, 2024

damemi Nov 25, 2024

RonFed Nov 23, 2024

damemi Nov 25, 2024

RonFed Nov 23, 2024

RonFed commented Nov 23, 2024

damemi commented Nov 25, 2024

damemi commented Nov 27, 2024

damemi commented Dec 3, 2024

RonFed commented Dec 4, 2024

Refactor Probe API #1307

Are you sure you want to change the base?

Refactor Probe API #1307

Conversation

damemi commented Nov 22, 2024

Summary

Current Design Problems

Proposal

Interfaces

Default implementations

Refactors

New features

MrAlias left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

damemi Nov 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RonFed commented Nov 23, 2024

damemi commented Nov 25, 2024

damemi commented Nov 27, 2024

damemi commented Dec 3, 2024

RonFed commented Dec 4, 2024

damemi Nov 25, 2024 •

edited

Loading