Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add net/http/pprof profiling endpoint to http port #913

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

timvaillancourt
Copy link
Member

@timvaillancourt timvaillancourt commented Jul 6, 2023

This PR adds the ability to gather pprof profiles from Nebula over HTTP in order to understand/improve performance and understand the high CPU usage we see in Slack's production Vitess platform

This change requires making the http server opened by stats.go to become more generic than "just stats", so some logic has been shuffled into http.go and a new, generic config variable http.listen has been added to define this port

For backwards compatibility, http.listen will fallback to using the value for stats.listen if http.listen is undef. I suggest the stats.listen var (and the fallback) is removed in the future because the http port will not be just stats after this PR

@timvaillancourt timvaillancourt marked this pull request as draft July 6, 2023 17:44
Signed-off-by: Tim Vaillancourt <[email protected]>
Signed-off-by: Tim Vaillancourt <[email protected]>
@timvaillancourt timvaillancourt marked this pull request as ready for review July 6, 2023 23:41
Signed-off-by: Tim Vaillancourt <[email protected]>
Signed-off-by: Tim Vaillancourt <[email protected]>
Signed-off-by: Tim Vaillancourt <[email protected]>
@timvaillancourt timvaillancourt changed the title Add http pprof profiling endpoint Add net/http/pprof profiling endpoint to http port Jul 7, 2023
@wadey
Copy link
Member

wadey commented Jul 7, 2023

Thanks! We will consider this, but I'm not sure if you are aware there is already a method to get pprof traces using the SSH debug interface.

Once you connect you can run help to see the methods, but the ones you might care about are start-cpu-profile, stop-cpu-profile.

Defined here in the code: https://github.com/slackhq/nebula/blob/v1.7.2/ssh.go#L228

@wadey wadey added the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Jul 7, 2023
Signed-off-by: Tim Vaillancourt <[email protected]>
@timvaillancourt timvaillancourt changed the title Add net/http/pprof profiling endpoint to http port Add net/http/pprof profiling endpoint to http port Jul 7, 2023
@timvaillancourt
Copy link
Member Author

Thanks! We will consider this, but I'm not sure if you are aware there is already a method to get pprof traces using the SSH debug interface.

Once you connect you can run help to see the methods, but the ones you might care about are start-cpu-profile, stop-cpu-profile.

Defined here in the code: https://github.com/slackhq/nebula/blob/v1.7.2/ssh.go#L228

Thanks @wadey! I didn't notice that when I started this branch, but saw mentions of it in issues later 👍

It seems we don't enable the sshd server in Prod (although we could), so this isn't currently an option. That said, I'd much prefer to use the standard net/http/pprof approach because it would not require adding sshd to all our hosts, it is easier to plug-in to our automation that already supports fetching profiles via the standard http approach, and /debug/pprof provides access to more information than CPU profiles, such as heap and mutex usage, tracing, etc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants