Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Donation Proposal]: Beyla, eBPF auto-instrumentation tool for metrics and traces #2406

Open
grcevski opened this issue Oct 23, 2024 · 32 comments
Labels
area/donation Donation Proposal

Comments

@grcevski
Copy link

grcevski commented Oct 23, 2024

Description

Grafana Labs would like to offer the donation of Beyla to the OpenTelemetry project.

Beyla is a mature eBPF-based auto-instrumentation tool for OpenTelemetry metrics and traces, for multiple languages and protocols. It enables cluster-wide/system-wide auto-instrumentation of applications without the need for application code/configuration changes or application restarts. To achieve this, Beyla uses a combination of protocol-level instrumentation based on network events and language/runtime-level instrumentation where needed. While Beyla works on bare metal installations, virtual machines, etc., the tool is also fully Kubernetes-aware and can be deployed as a daemonset or as a sidecar. Beyla is used by a number of customers in production, including Grafana Labs itself for the Grafana Cloud hosted offering.

Some of the main uses of Beyla are:

  • Provide auto-instrumentation for programming languages where OpenTelemetry SDK zero-code auto-instrumentation is not supported, such as Rust, C++, Erlang, Zig, Ruby, Swift, Perl, Lua, Dart, R, Java GraalVM Native, Julia…
  • Provide auto-instrumentation for legacy applications, where it’s not easy to migrate the codebase to the OpenTelemetry SDK compatible frameworks.
  • Provide auto-instrumentation for applications where the source is not available or are proprietary and/or distributed in binary form.
  • Provide a unified way to capture application-level metrics across all different technologies used by a customer.
  • Provide network-level metrics, regardless of the L3/L4/L7 protocol used for the purpose of building service graphs and reachability reports.
  • Provide process-level metrics for instrumented applications.

Some of the core features of Beyla include:

  • Application level instrumentation (metrics and traces) for HTTP, HTTPS (libssl3 and Go), HTTP/2, gRPC, SQL, Kafka, Redis, CUDA (Nvidia GPUs).
  • Augments the protocol level instrumentation detection with runtime instrumentation for certain programming languages, e.g. Go and NodeJS.
  • Network level instrumentation for any protocol for the purpose of connectivity monitoring, which doesn’t conflict with any Kubernetes CNI (including Cilium CNI), including send/receive byte level accounting.
  • No root access required: Beyla does not require to be run as root, nor to be run in privileged mode in Docker containers. Beyla is able to use the finer grained Linux system capabilities (permissions) to run with minimal security configuration. Beyla will gracefully degrade the functionality when certain permissions are not granted. For example, Beyla will not use certain helpers like bpf_probe_write_user when CAP_SYS_ADMIN is not granted.
  • Supports multi-process instrumentation, it can run as a daemonset and instrument the whole system/node/Kubernetes cluster from a single Beyla instance.
  • Is OpenTelemetry SDK instrumentation aware and avoids telemetry duplication. When the whole system is instrumented, Beyla will auto-detect if certain applications are already sending traces or metrics and will disable its own instrumentation for those applications, depending on what the application generates. For example, if a web application is generating OpenTelemetry traces, but not HTTP metrics, Beyla will still generate the HTTP metrics for that application and avoid generating traces.
  • Non-intrusive. Requires no additional agents or application level modifications, access to application source or configuration.
    Minimal performance/memory overhead. We share all probes and maps among all processes, and since the userspace side of the application is built with Go, it often has much lower overhead for metric and trace generation compared to the OpenTelemetry support for certain programming languages (e.g. interpreted languages).
  • It’s built with libbpf (ebpf-go) which means a single compiled binary can be deployed on any Linux kernel version which supports CO-RE. Currently Beyla supports all LTS versions of major Linux distributions, including kernel 4.18 with the backported patches by RedHat for CO-RE and BTF.

Benefits to the OpenTelemetry community

Donating Beyla will fill a gap in the overall OpenTelemetry application level instrumentation ecosystem, for applications which use programming languages which are not supported by the OpenTelemetry SDKs, which use proprietary frameworks or use older technologies. We also believe that it will fill in a gap with network level monitoring for the purpose of building solutions for service graphs and connectivity tracking.

This donation has a lot of synergy with the OpenTelemetry Profiling Agent, and we believe that in the future we can create a non-intrusive, generic profiling to TraceID correlation by leveraging the two projects.

Reasons for donation

We at Grafana Labs prefer that customers use the upstream OpenTelemetry SDKs for application level instrumentation, however we often find that certain customers are unable to use the recommended approach because of their current technology use. We built Beyla as an easy way for our customers to get started with OpenTelemetry, while they are in their transition process of upgrading their software, which sometimes takes years. Oftentimes, customers also use binary distributions of software, and are unable to instrument these applications depending on the technology the binaries are built with.

We believe that we are not alone in this need to move customers to OpenTelemetry quicker, where they can’t currently leverage the existing OpenTelemetry ecosystem. This is why we’d like to make this project a community project, where multiple companies can be stakeholders and we can build a better community around it, compared to what Grafana Labs can do alone.

Relation with Other OpenTelemetry Projects

We also see this donation as an opportunity to combine the eBPF based auto-instrumentation OpenTelemetry efforts. Our project borrows parts of the OpenTelemetry Go Auto-Instrumentation project and some of our Beyla maintainers participate in that project too. We’d like to fully merge our work on Go with OpenTelemetry Go Auto-Instrumentation and avoid the double contribution we do at the moment. Beyla’s support for auto-instrumentation goes way beyond Go auto-instrumentation, which is why we are proposing a new project donation. We’d like to fully merge all of our work on Go with the OpenTelemetry Go Auto-Instrumentation project and vendor it in Beyla as an import once the merge is complete. We are also open to combining the Go Auto-Instrumentation project into a new project for out-of-process auto-instrumentation with our donation.

We also see this donation as an opportunity to re-invigorate the OpenTelemetry eBPF Networking project. Beyla includes support for the majority of the functionality of that project, but it’s built with eBPF-Go (libbpf), which means it uses CO-RE and it can be deployed on any kernel without specific kernel builds or deploying compilation toolchain on the target system.

Our development stack is identical to what’s used by OpenTelemetry Go Auto-Instrumentation and the OpenTelemetry eBPF Profiler. Developers on those projects will easily be able to contribute to this project and it will bring all of the OpenTelemetry eBPF tooling at the same level.

Repository

https://github.com/grafana/beyla

Existing usage

Beyla is used by hundreds of users in production, including Grafana Cloud itself. We have a strong open-source community usage, the number of pulls of our Docker image is around 100,000 a month and it has been growing steadily since inception of the project. For example, our Docker image pulls in April of 2024 were around 30,000 a month.

Maintenance

We have 4 full-time maintainers on the project which will move work full-time on the OpenTelemetry project if accepted. We have over 40 contributors on the project, most of which are not Grafana Labs employees or affiliated in any way with Grafana Labs.

Licenses

Apache 2.0 License
Our eBPF probe source is dual licensed with GPL/MIT as per the requirements of the Linux Kernel. This is identical to the approach used by OpenTelemetry Go Auto-Instrumentation and OpenTelemetry Profiler.

Trademarks

The name Beyla currently appears in a number of places in the codebase and is a Grafana Labs Trademark. We are happy to donate the name too, however we understand that it’s not compatible with how OpenTelemetry projects are typically named. We are happy to remove any of these name references when the project is donated, if the name donation is not acceptable.

Other notes

This proposal has been socialized with @MrAlias (maintainer of OpenTelemetry Go Auto Instrumentation) and @atoulme (maintainer of OpenTelemetry eBPF Networking)

@edeNFed
Copy link
Contributor

edeNFed commented Oct 23, 2024

I’m looking forward to Beyla's potential donation to the OpenTelemetry project, as it helps cover important gaps in auto-instrumentation for unsupported languages and environments.

That said, this donation comes with some challenges since a lot of Beyla’s work overlaps with existing OpenTelemetry projects like Go auto-instrumentation, eBPF Profiler, eBPF networking and OpenTelemetry Operator. The community already has efforts addressing these areas, so it’s important to understand how Beyla will fit in and integrate with these projects.

As part of the donation, it’s crucial to ensure the current core OpenTelemetry repositories remain the main source of truth, and that we avoid duplicating code or functionality. It would be helpful to see how Beyla and existing projects can come together without redundancy.

I’m also interested in how Beyla will eventually be integrated as a collector receiver in the OpenTelemetry architecture. To make this work smoothly, Beyla should be able to use existing components as dependencies rather than duplicating what’s already there.

@grcevski
Copy link
Author

That said, this donation comes with some challenges since a lot of Beyla’s work overlaps with existing OpenTelemetry projects like Go auto-instrumentation, eBPF Profiler, eBPF networking and OpenTelemetry Operator. The community already has efforts addressing these areas, so it’s important to understand how Beyla will fit in and integrate with these projects.

Thanks for the comments Eden. The main overlap in functionality is related to Go Auto Instrumentation, for which we propose to merge our functionality there and vendor it in the new project. The main challenge I see is the multi-process support, which we need for fleet wide monitoring, however I'm sure we can overcome these challenges. For eBPF Networking, I think we can use this as an opportunity to bring the functionality at the same level as Go Auto, using similar development stack and libbpf CO-RE based approach.

I don't think the donation overlaps in any way with the OpenTelemetry Operator or the OpenTelemetry eBPF Profiler. I think providing a generic way to extract trace/span information for the eBPF Profiler will be great to be able to correlate traces with profilers.

@grcevski
Copy link
Author

I’m also interested in how Beyla will eventually be integrated as a collector receiver in the OpenTelemetry architecture. To make this work smoothly, Beyla should be able to use existing components as dependencies rather than duplicating what’s already there.

I'm not sure there's much duplication there, except with the eBPF networking component, which we addressed in relationships to existing OpenTelemetry Projects. There's a recent request to add Beyla as a component in the OpenTelemetry Collector, which this would help a lot. open-telemetry/opentelemetry-collector-contrib#34321

@damemi
Copy link

damemi commented Oct 23, 2024

Thanks for the detailed proposal @grcevski! I think this is great for building progress on OpenTelemetry/eBPF and covering existing gaps.

To mirror what @edeNFed said, avoiding confusion and duplication is important. But I think you have explained that the idea is to vendor the existing Go Auto-Instrumentation as a dependency into the Beyla donation. That makes sense to me, as it fits with the goals we've been working on together in Go Auto (ie, to make that repo a library/API/SDK that can be imported by other implementations).

To that, it makes sense that OpenTelemetry would provide both (a) an open-source library/framework for eBPF instrumentation with a "raw" agent as the default artifact and (b) an open-source component consuming that framework to provide second-level functionality and usability. @jsuereth and I were actually talking about this, and he compared this situation to roughly to how the collector works.

I think the potential overlap with the OpenTelemetry Operator is in the fact that the Operator does deploy that default agent from Go Auto-Instrumentation, but that's about it. To draw back to the collector comparison, I would say that the Operator is to the Collector as Beyla is to Collector-Contrib: built on a stable, minimal core with added functionality. Both exist to give users options based on their needs.

All that said, we should make sure to apply the same standards for donation that we are also applying to the Compile-time Go Instrumentation donation. Specifically:

  • The relationship between the new repo and existing repo must be well-defined. Will maintainers from existing Go Auto-Instrumentation overlap with the new repo? Will Beyla have its own SIG and meetings? Does this add any burden to the existing project?
  • Are there maintainers from multiple companies? You mentioned that you have other contributors, would it be possible to propose an initial maintainers list for the new repo? (like we are asking from the compile-time proposal)

All in all, I wouldn't be surprised to see these 3 projects collaborate and converge more often as time goes on. Thanks for your work on this @grcevski!

@svrnm
Copy link
Member

svrnm commented Oct 23, 2024

I am by no means an expert on ebpf but one thing I'd like to ask:

would it be possible to work towards one ebpf solution that combines what beyla does (auto instrumentation with traces, metrics I suppose + networking) + the profiler?

Because at the end what people want (see this discussion for example: open-telemetry/opentelemetry-specification#4255) is a combination of all four signals, but if those 2 projects are separate we either need a way to install them side-by-side or people have to choose.

@damemi
Copy link

damemi commented Oct 23, 2024

I think that one ebpf solution would be something like Beyla. But, I don't think that idea means all of the code for every signal+language lives in one monorepo with the higher-level component. That's what I mean by separate repos at least

@RonFed
Copy link

RonFed commented Oct 23, 2024

I agree with @edeNFed and @damemi comments.

Having projects handling auto-instrumentation and on top of them higher level implementations (like the Operator or Beyla) which uses multiple other projects is a good structure in my opinion.

As a maintainer in the go-auto-instrumentation project, I'd be happy to accept donations from Beyla to the current project.

@dashpole
Copy link

I'm excited to see this donation proposal! I have made a few contributions to Beyla in the past, and have found the maintainers knowledgeable, kind, and helpful. I also think Beyla fills an important gap by providing language-agnostic telemetry. There are definitely details to work out, but i'm very supportive of this proposal.

@mtwo
Copy link
Member

mtwo commented Oct 25, 2024

This looks great, and thanks @grcevski for calling out how this relates to and can merge or interoperate with Go auto-instrumentation, network monitoring (@yonch FYI), and the profiling agent (FYI @christos68k, @petethepig, @felixge, @fabled)! These were going to be the first questions that I asked, and it looks like we already have good notions about how things can proceed with each. Now that we have several projects in flight that use eBPF, it seems sensible to have them inherit from a common base, if possible.

@alolita and I will be on point for this process for the Governance Committee. We'll circle back in a few days with next steps once more community members have time to comment.

@cforce
Copy link

cforce commented Oct 28, 2024

Don't miss out on insights from OpenTelemetry Network traces! There’s been always demand for deeper eBPF integration within the OpenTelemetry Collector 🎉

@grcevski
Copy link
Author

grcevski commented Nov 8, 2024

Hey @mtwo and @alolita, I just wanted to follow-up here and see if you had any next steps for us. I'm guessing we are all busy with KubeCon next week :), but I thought I should drop a note.

@yonch
Copy link
Contributor

yonch commented Nov 8, 2024

Hi @grcevski!

Haven't seen you at the eBPF SIG meeting.. Let's meet at Kubecon to discuss more about the network aspects?

I'm Jonathan Perry on CNCF slack

@mtwo
Copy link
Member

mtwo commented Nov 10, 2024

Hey @mtwo and @alolita, I just wanted to follow-up here and see if you had any next steps for us. I'm guessing we are all busy with KubeCon next week :), but I thought I should drop a note.

Yes! Apologies - I was slammed this week! We'll start the process after Kubecon.

Will you / any other members of the Beyla team be there? If so, I like @yonch's suggestion: we can meet and work out more parts of the story with the other OTel eBPF projects and Go instrumentation projects.

@grcevski
Copy link
Author

Will you / any other members of the Beyla team be there? If so, I like @yonch's suggestion: we can meet and work out more parts of the story with the other OTel eBPF projects and Go instrumentation projects.

Yes, I'll be there. I connected with @yonch and we arranged to meet there, so this sounds like a great idea, let's all meet there and we can discuss all parts of the donation. Thanks and see you next week!

@mtwo
Copy link
Member

mtwo commented Nov 10, 2024 via email

@yonch
Copy link
Contributor

yonch commented Nov 14, 2024

@grcevski and I met yesterday and talked about the networking aspects.

We agreed to work next week on a short statement on the tradeoffs each project offers to make it clearer for users wanting to decide which is a better fit. We also discussed porting capabilities which seems like a good avenue to pursue after donation.

@mtwo
Copy link
Member

mtwo commented Nov 18, 2024

Catching up after conversations that I had with @grcevski and @MrAlias at Kubecon.

We want to end up with one way for OTel end users to instrument their Go, C++, Rust, etc. applications with an agent. Put another way, we don't want them to have to choose between two different OTel eBPF options for the same use case.

From our conversations, I think that we're all aligned on this. @MrAlias raised the possibility of having Beyla (the name will change as part of the donation) depend on OTel Go instrumentation, and that end users would use Beyla. Does this make sense to everyone else? There are likely other options as well.

@damemi
Copy link

damemi commented Nov 19, 2024

@mtwo speaking from Odigos's perspective, that's what we'd like to see too.

Keeping the current OTel Go repo as a library/dependency allows vendors like us to implement custom instrumentation controllers for our users, while offering a stock OSS component built on that library provides a useful story for the default open source end user.

I don't think that's uncommon either, as we see it in other areas of OTel too (like custom SDKs and Collector builds). These underscore the fact that OpenTelemetry is a standard, not just a set of off-the-shelf tools. And enabling custom implementations of the standard promotes the overall health of the project.

Like I mentioned to @grcevski and @MrAlias last week, I'm very interested in contributing to this as well, as I think it is a big benefit to the OTel ecosystem, which benefits us as well.

@mtwo
Copy link
Member

mtwo commented Nov 19, 2024

Awesome, by the thumbs ups and @damemi's response, I think that we're all aligned. I think that the remaining things that we need to close on are:

  • A name, ideally something self-descriptive like the other OpenTelemetry SIG / package names.
  • A decision about any level of code sharing or integration with opentelemetry-network. "We have talked and these will remain separate" is a fine answer, but we need to arrive at some kind of answer.
  • A decision about any level of code sharing or integration with opentelemetry-ebpf-profiler. "We have talked and these will remain separate" is a fine answer, but we need to arrive at some kind of answer.

Does that make sense to everyone else? Did I miss anything?

@cforce
Copy link

cforce commented Nov 19, 2024

@mtwo speaking from Odigos's perspective, that's what we'd like to see too.

Keeping the current OTel Go repo as a library/dependency allows vendors like us to implement custom instrumentation controllers for our users, while offering a stock OSS component built on that library provides a useful story for the default open source end user.

I don't think that's uncommon either, as we see it in other areas of OTel too (like custom SDKs and Collector builds). These underscore the fact that OpenTelemetry is a standard, not just a set of off-the-shelf tools. And enabling custom implementations of the standard promotes the overall health of the project.

Like I mentioned to @grcevski and @MrAlias last week, I'm very interested in contributing to this as well, as I think it is a big benefit to the OTel ecosystem, which benefits us as well.

I didn’t quite extract the full question, but how will the integration—beyond just the code—fit into the architecture of the collector? Specifically, how will the integration of Beyla (if any) and the OpenTelemetry Collector (including contrib modules) take shape? Managing the behavior of collection, processing, and exporting within a single executable, while leveraging the same configuration schema and patterns, offers tremendous advantages. Additionally, using ecosystem components like OpAMP or a supervisor for one stop remote management adds significant value and shouldn’t be overlooked without this integration.

@grcevski
Copy link
Author

  • A decision about any level of code sharing or integration with opentelemetry-network. "We have talked and these will remain separate" is a fine answer, but we need to arrive at some kind of answer.
  • A decision about any level of code sharing or integration with opentelemetry-ebpf-profiler. "We have talked and these will remain separate" is a fine answer, but we need to arrive at some kind of answer.

Thanks @mtwo, I'll work on getting answers on these decisions this week. I'm already in process of doing this for opentelemetry-network, I'll join the profiler SIG and try to get an answer.

@grcevski
Copy link
Author

I didn’t quite extract the full question, but how will the integration—beyond just the code—fit into the architecture of the collector? Specifically, how will the integration of Beyla (if any) and the OpenTelemetry Collector (including contrib modules) take shape? Managing the behavior of collection, processing, and exporting within a single executable, while leveraging the same configuration schema and patterns, offers tremendous advantages. Additionally, using ecosystem components like OpAMP or a supervisor for one stop remote management adds significant value and shouldn’t be overlooked without this integration.

Hi @cforce, I think we can make Beyla (that is the new project) a component of the OpenTelemetry Collector. We've done this already for Grafana Alloy and we have some experience there. Actually, we use the Collector SDK already for traces, so integration should be even less of a problem.

One major challenge I see, which we've faced with our Alloy component, is that like all eBPF agents, we do require some elevated permissions. Beyla doesn't need "privileged containers" or CAP_SYS_ADMIN, but depending on what functionality is enabled we might ask for CAP_NET_ADMIN, CAP_NET_RAW, CAP_PTRACE etc. Having these permissions on a locked down daemonset which only does outgoing network requests in a standalone mode is one thing, requiring the OpenTelemetry Collector to run with these privileges is another thing. A CVE on any enabled OpenTelemetry Collector module/component, could potentially be a lot more serious with a collector that is running with elevated permissions.

@grcevski
Copy link
Author

We met at KubeCon with @yonch and discussed the overlap and differences between Beyla and what OpenTelemetry eBPF networking provides. While there is quite a bit of overlap, there's certain missing product functionality in Beyla at the moment which the OpenTelemetry eBPF networking project provides:

  1. opentelemetry-network matches the network flows client <-> server outside of the agent, which allows for single flows to be identified before they are stored in the target metric. This allows support for host networking, systemd, plain Docker containers, etc., and prevents double-counting. Beyla doesn't do this, it will store the separate client and server flows as metrics and it relies on the query capabilities of the product consuming the metrics to perform the matching.
  2. opentelemetry-network collects DNS health metrics such as per-domain latency, timeouts, and errors (to facilitate detecting misconfigured services), which Beyla doesn't provide as such. Beyla tracks UDP, so it's possible to derive these missing DNS metrics, however it will require work on the side of the product consuming the data, e.g. look for UDP flows on port 53.
  3. Both projects enrich network flows with information from Kubernetes API, Cloud provider APIs or DNS, however opentelemetry-network uses the requested DNS names rather than reverse-DNS (which is more accurate for IP addresses that host multiple domains), and adds Autonomous Systems for Internet IPs to enable detecting connectivity issues to specific providers.

We discussed also that from the eBPF Agent side perspective, that Beyla's use of CO:RE, libbpf and eBPF-Go greatly simplifies the deployment and the development process of the agent and that if OpenTelemetry eBPF Networking started today as a project, it would likely adopt the same approach. Beyla is also able to use finer grained permissions, which makes the deployment easier from security risk assessments.

There are two possible approaches to continue forward:

  • Do nothing related to opentelemetry-network when the new projects lands in OpenTelemetry. OpenTelemetry eBPF networking will continue to exist in the current form and follow its own roadmap.
  • Add support for providing the required data for the opentelemetry-network services with the new agent, so that OpenTelemetry eBPF Networking can provide the same level/quality of information as before, but with a new agent which can easily be deployed on every kernel.

From the discussion with @yonch, we both prefer if there's no action for opentelemetry-network at the time of the new project donation and we'll then work together to add what's required in Beyla to support opentelemetry-network as an alternative agent.

@cforce
Copy link

cforce commented Nov 21, 2024

One major challenge I see, which we've faced with our Alloy component, is that like all eBPF agents, we do require some elevated permissions. Beyla doesn't need "privileged containers" or CAP_SYS_ADMIN, but depending on what functionality is enabled we might ask for CAP_NET_ADMIN, CAP_NET_RAW, CAP_PTRACE etc. Having these permissions on a locked down daemonset which only does outgoing network requests in a standalone mode is one thing, requiring the OpenTelemetry Collector to run with these privileges is another thing. A CVE on any enabled OpenTelemetry Collector module/component, could potentially be a lot more serious with a collector that is running with elevated permissions.

Hi grcevski , I share your concerns, and from a security perspective, it's indeed better to dedicate a separate executable for sensitive operations. In my opinion, any solution within the ecosystem for an eBPF agent should ensure that OpAMP management is implemented as a mandatory component.

Building on existing concepts and capabilities of the collector, a specialized collector with elevated rights could integrate the eBPF collection receiver, processors, OTLP exporter, and OpAMP and auth extensions. This collector would be minimalistic and
hardened, avoiding the use of contrib components while still leveraging the existing collector architecture.

Additionally, a separate side-by-side collector could run without eBPF components, connecting inbound to the eBPF collector as a process or gateway. This approach allows the hardening of the eBPF collector components to benefit the overall collector builds without reinventing the wheel.

Let me know your thoughts!

@yonch
Copy link
Contributor

yonch commented Nov 21, 2024

Building on existing concepts and capabilities of the collector, a specialized collector with elevated rights could integrate the eBPF collection receiver, processors, OTLP exporter, and OpAMP and auth extensions. This collector would be minimalistic and
hardened, avoiding the use of contrib components while still leveraging the existing collector architecture.

FWIW we received this type of feedback from users for opentelemetry-network as well. One (very) large company asked that we separate the collector even further:

  • an elevated permission component that sets up eBPF programs and shared memory but does not process events or communicate.
  • a low-permission event processor that uses the shared memory, and is able to communicate to the outside world (e.g., via sockets).

This is not to say this architecture is necessary -- neither for donation nor afterwards. Just that if a project wants to appeal to more security-conscious organizations, keeping privileged code small and separate from more complex handling appears prudent.

@grcevski
Copy link
Author

Hi grcevski , I share your concerns, and from a security perspective, it's indeed better to dedicate a separate executable for sensitive operations. In my opinion, any solution within the ecosystem for an eBPF agent should ensure that OpAMP management is implemented as a mandatory component.

I agree, I'm not opposed to implementing this :), I just wanted to point out the potential draw backs. I also think that having the tool that collects the data be in the same executable as the tool that processes and sends the data, without an extra network hop, it extremely efficient. So there's definitely a lot of merit in this approach.

@grcevski
Copy link
Author

  • A decision about any level of code sharing or integration with opentelemetry-ebpf-profiler. "We have talked and these will remain separate" is a fine answer, but we need to arrive at some kind of answer.

I reached out to the opentelemetry profiler group on the CNCF Slack and based on the discussion I think the best way forward would be to approach the collaboration in the same manner as collaborating with the opentelemetry-go-autoinstrumentation project. Our instrumentation can greatly benefit from the stack walking capability of the opentelemetry-ebpf-profiler, in a sense that we can attach stack traces when an error happens in a transaction. We also share the need to parse headers and Go data structures to be able to extract the trace context.

There is already work ongoing by other community members along the same lines related to opentelemetry-ebpf-profiling, open-telemetry/opentelemetry-ebpf-profiler#192. We'd like to leverage this as well.

I think the best way forward is to do nothing around the time of the donation, but then start vendoring the capabilities of the opentelemetry-ebpf-profiler project to expand and common the functionality of the two projects.

@grcevski
Copy link
Author

grcevski commented Dec 4, 2024

  • A name, ideally something self-descriptive like the other OpenTelemetry SIG / package names.

After a bit of back and forth I'd like to propose the name of opentelemetry-ebpf-instrumentation. Here are couple of reasons why this might be a suitable name for the new project:

  1. The name as such will closely align with how opentelemetry-ebpf-profiling is named and it's very descriptive about what the project actually is.
  2. The other avenues we can pursue are related to out-of-process instrumentation or zero-code instrumentation. However, I find those will have conflicts with other functionality within OpenTelemetry. Namely, zero-code instrumentation based on other approaches already exists for few programming languages and it will surely grow in the future. Out-of-process is more closely related to our mission, however there are different ways to achieve out of process instrumentation beside eBPF, e.g. LD_LIBRARY_PRELOAD or binary patching, which are not within the scope of the project. One can also possibly argue that the OpenTelemetry Operator is also out of process instrumentation.
  3. I know we had the discussion of potentially not including the eBPF in the name, i.e. name the project based on the task it does rather than the technology it uses, however there's the argument that if we don't explicitly point out it's eBPF based, we might be misleading the community because the technology will not work for platforms where eBPF is not supported. eBPF is not supported on platforms like Windows, BSD, MacOS as well as various embedded Linux distributions. It's also not supported in various managed cloud platforms or function as a service environments.

Please let me know your thoughts.

@grcevski
Copy link
Author

grcevski commented Dec 4, 2024

Hi @mtwo and @alolita, I believe I've answered the 3 remaining questions, please let me know if there's more I need to follow-up on. Thank you!

@svrnm svrnm added the area/donation Donation Proposal label Dec 9, 2024
@mtwo
Copy link
Member

mtwo commented Dec 11, 2024

Apologies for the delay, I've been pretty ill for the past two weeks and am just getting to this now. I think that we're good to proceed with the next steps! I'll check with the rest of the GC on Thursday and will report back.

@mtwo
Copy link
Member

mtwo commented Dec 13, 2024

@grcevski just waiting on the TC to assign a reviewer, and then we should be good to go

@grcevski
Copy link
Author

Amazing, thanks so much @mtwo and I hope you are feeling better!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/donation Donation Proposal
Projects
None yet
Development

No branches or pull requests

9 participants