Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat_: add Sentry panic reporting #6054

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from
Open

feat_: add Sentry panic reporting #6054

wants to merge 5 commits into from

Conversation

igor-sirotin
Copy link
Collaborator

@igor-sirotin igor-sirotin commented Nov 6, 2024

Requires:

Iterates:

Desktop PR:

Note

TODO:

  • Add tests
  • Init/Close Sentry when toggling Share usage data with Status
  • Initialize Sentry for unit tests
    Won't be done: go tests run in goroutines, which is created internally in testing package.
    The only way to workaround this is to add defer common.LogOnPanic() to EACH TEST. This doesn't seem to be that easy right now. And the benefit is not that big: we only need this for develop/nightly runs.
  • Open PR for clients that initializes Sentry
  • Consider case when running client e2e tests with no metrics reporting (we still want crash reports)
    I guess we won't this do this now as well. Not high priority to have reports from in-test crashes.

Next steps:

Description

This PR iterates integration of Sentry into Status project. So far:

  • only for status-go (including when running as part of desktop, mobile and tests)
  • only for panics/crashes (no massive error reporting)

🛬 Where

We use self-hosted Sentry: https://sentry.infra.status.im/

🕐 When

When a panic (crash) happens in status-go in one of those cases:

  • When running inside status-desktop/status-mobile:
    • during API calls in /mobile/status.go
    • inside all goroutines
    • NOT during API calls in /services/**/api.go
  • When running status-backend:
    • any panic

📦 What

Here's some highlights:

  • Device info (OS, architecture, number of cpu cores)

  • Error stack trace:
    Stacktrace contains paths to source files. When built locally, this might contain sensitive information, e.g. user name.
    This will not be a problem for us, as we will only enable Sentry for builds made on our CI, so paths will only contain our CI-related paths, which is safe.

  • Trace ID (so far will be unique for each event)

    Trace: A collection of spans representing the end-to-end journey of a request through your system that all share the same trace ID.

    More details in sentry docs.

Examples

Re: Privacy policy

Only enabled for users that both:

  1. Opted-in for metrics
  2. Use builds made with our CI

The feature is opt-in only:

status-go/mobile/status.go

Lines 107 to 115 in 3a4d917

if centralizedMetricsInfo.Enabled {
err = sentry.Init(
sentry.WithDSN(request.SentryDSN),
sentry.WithDefaultContext(),
)
if err != nil {
return makeJSONResponse(err)
}
}

I assume this should be fine within the existing Privacy policy. We don't report any more privacy-enclosing information than to the opt-in metrics.

Please check Sentry security and compliance for more details.

Configuration

I added 2 main tags to identify the error. The configuration is a bit complicated, but provides full information.

Environment Context
Question Where it is running? What is the executable for the library?
Set time - production can only be set at build time to prevent users from hacking the environment
- All others can be set at runtime, because on CI we sometimes use same build for multiple environments
Always at build-time
Expected values
ValueDescription
productionEnd user machine
developmentDeveloper machine
ci-prPR-level CI runs
ci-mainCI runs for stable branch
ci-nightlyCI nightly jobs on stable branch
development and ci-pr are dropped, because we only want to consider panics from stable code
ValueRunning as...
status-desktopLibrary embedded into status-desktop
status-mobileLibrary embedded into status-mobile
status-backendPart of cmd/status-backend
Can be other cmd/* as well.
matterbridgePart of Status/Discord bridge app
status-go-testsInside status-go tests

To cover this requirements, I added these environment variables:

Environment variable Provide time Description
SENTRY_DSN - At build time with direct call to sentry.Init
- At runtime with InitializeApplication endpoint
Sentry DSN to be used
SENTRY_CONTEXT_NAME
SENTRY_CONTEXT_VERSION
Build time Execution context of status-go
SENTRY_PRODUCTION Build time When true or 1:
-Defines if this is a production build
-Sets environment to production
-Has precedence over runtime SENTRY_ENVIRONMENT
SENTRY_ENVIRONMENT Run time Sets the environment. Has no effect when SENTRY_ENVIRONMENT is set

Client instructions

  1. Set SENTRY_CONTEXT_NAME and SENTRY_CONTEXT_VERSION at status-go build time

  2. Provide sentryDSN to the InitializeApplication call.
    DSN must be kept private and will be provided by CI. Expect a STATUS_GO_SENTRY_DSN environment variable.

    Why can't we consume `STATUS_GO_SENTRY_DSN` directly in status-go build?
     In theory, we could. But this would require us to mix approaches of saving the env variable to the code.
     Right now we prefer `go:generate + go:embed` approach (e.g. https://github.com/status-im/status-go/pull/6014), but we can't do it in this case, because we must not write the DSN to a local file, which would be a bit vulnerable. And I don't want to go back to `-ldflags="-X github.com/status-im/status-go/internal/sentry.sentryDSN=$(STATUS_GO_SENTRY_DSN:v%=%)` approach.
     Let me know if I'm wrong 🙂 
    

Implementation details

@status-im-auto
Copy link
Member

status-im-auto commented Nov 6, 2024

Jenkins Builds

Click to see older builds (79)
Commit #️⃣ Finished (UTC) Duration Platform Result
✖️ 717c934 #1 2024-11-06 12:23:29 ~2 min tests 📄log
✔️ 717c934 #1 2024-11-06 12:24:58 ~3 min macos 📦zip
✔️ 717c934 #1 2024-11-06 12:25:54 ~4 min linux 📦zip
✔️ 717c934 #1 2024-11-06 12:26:09 ~4 min tests-rpc 📄log
✔️ 717c934 #1 2024-11-06 12:26:28 ~5 min macos 📦zip
✔️ 717c934 #1 2024-11-06 12:26:29 ~5 min ios 📦zip
✔️ 717c934 #1 2024-11-06 12:26:50 ~5 min windows 📦zip
✔️ 717c934 #1 2024-11-06 12:27:21 ~6 min android 📦aar
✖️ 3a4d917 #2 2024-11-06 12:31:20 ~2 min tests 📄log
✔️ 3a4d917 #2 2024-11-06 12:33:33 ~4 min windows 📦zip
✔️ 3a4d917 #2 2024-11-06 12:33:39 ~4 min tests-rpc 📄log
✔️ 3a4d917 #2 2024-11-06 12:33:46 ~4 min linux 📦zip
✔️ 3a4d917 #2 2024-11-06 12:34:20 ~5 min macos 📦zip
✔️ 3a4d917 #2 2024-11-06 12:34:25 ~5 min macos 📦zip
✔️ 3a4d917 #2 2024-11-06 12:35:22 ~6 min android 📦aar
✔️ 3a4d917 #2 2024-11-06 12:35:24 ~6 min ios 📦zip
✖️ 3a4d917 #3 2024-11-06 17:39:19 ~1 min tests 📄log
✔️ 3a4d917 #3 2024-11-06 17:39:45 ~2 min tests-rpc 📄log
✔️ 3a4d917 #3 2024-11-06 17:41:24 ~3 min windows 📦zip
✔️ 3a4d917 #3 2024-11-06 17:42:00 ~4 min linux 📦zip
✔️ 3a4d917 #3 2024-11-06 17:42:32 ~5 min macos 📦zip
✔️ 3a4d917 #3 2024-11-06 17:42:50 ~5 min macos 📦zip
✔️ 3a4d917 #3 2024-11-06 17:42:50 ~5 min ios 📦zip
✔️ 3a4d917 #3 2024-11-06 17:43:30 ~6 min android 📦aar
✖️ fcedb01 #4 2024-11-06 18:06:40 ~3 min tests 📄log
✔️ fcedb01 #4 2024-11-06 18:07:06 ~4 min windows 📦zip
✔️ fcedb01 #4 2024-11-06 18:07:29 ~4 min linux 📦zip
✔️ fcedb01 #4 2024-11-06 18:07:35 ~4 min tests-rpc 📄log
✔️ fcedb01 #4 2024-11-06 18:07:58 ~5 min macos 📦zip
✔️ fcedb01 #4 2024-11-06 18:08:07 ~5 min macos 📦zip
✔️ fcedb01 #4 2024-11-06 18:08:10 ~5 min ios 📦zip
✔️ fcedb01 #4 2024-11-06 18:09:14 ~6 min android 📦aar
✔️ 08ee67a #5 2024-11-06 18:23:12 ~3 min windows 📦zip
✔️ 08ee67a #5 2024-11-06 18:23:44 ~4 min linux 📦zip
✔️ 08ee67a #5 2024-11-06 18:23:49 ~4 min tests-rpc 📄log
✖️ 08ee67a #5 2024-11-06 18:23:50 ~4 min tests 📄log
✔️ 08ee67a #5 2024-11-06 18:24:17 ~5 min macos 📦zip
✔️ 08ee67a #5 2024-11-06 18:24:20 ~5 min ios 📦zip
✔️ 08ee67a #5 2024-11-06 18:24:31 ~5 min macos 📦zip
✔️ 08ee67a #5 2024-11-06 18:25:44 ~6 min android 📦aar
✔️ 5dd5fc0 #6 2024-11-06 19:16:41 ~3 min windows 📦zip
✔️ 5dd5fc0 #6 2024-11-06 19:17:08 ~4 min tests-rpc 📄log
✔️ 5dd5fc0 #6 2024-11-06 19:17:24 ~4 min linux 📦zip
✔️ 5dd5fc0 #6 2024-11-06 19:17:38 ~4 min macos 📦zip
✔️ 5dd5fc0 #6 2024-11-06 19:17:46 ~5 min macos 📦zip
✔️ 5dd5fc0 #6 2024-11-06 19:18:16 ~5 min ios 📦zip
✔️ 5dd5fc0 #6 2024-11-06 19:19:06 ~6 min android 📦aar
✔️ 5dd5fc0 #6 2024-11-06 19:46:40 ~33 min tests 📄log
✔️ 16d7c02 #7 2024-11-19 19:39:21 ~3 min macos 📦zip
✔️ 16d7c02 #7 2024-11-19 19:39:51 ~4 min ios 📦zip
✖️ 16d7c02 #7 2024-11-19 19:40:03 ~4 min tests 📄log
✔️ 16d7c02 #7 2024-11-19 19:40:13 ~4 min tests-rpc 📄log
✔️ 16d7c02 #7 2024-11-19 19:40:19 ~4 min linux 📦zip
✔️ 16d7c02 #7 2024-11-19 19:40:51 ~5 min macos 📦zip
✔️ 16d7c02 #7 2024-11-19 19:40:59 ~5 min windows 📦zip
✔️ 16d7c02 #7 2024-11-19 19:42:20 ~6 min android 📦aar
✔️ 8e9547a #8 2024-11-19 19:43:38 ~3 min macos 📦zip
✔️ 8e9547a #8 2024-11-19 19:44:17 ~4 min ios 📦zip
✔️ 8e9547a #8 2024-11-19 19:44:45 ~3 min windows 📦zip
✔️ 8e9547a #8 2024-11-19 19:44:50 ~4 min linux 📦zip
✔️ 8e9547a #8 2024-11-19 19:44:53 ~4 min tests-rpc 📄log
✔️ 8e9547a #8 2024-11-19 19:46:14 ~5 min macos 📦zip
✔️ 8e9547a #8 2024-11-19 19:48:53 ~6 min android 📦aar
✔️ 8e9547a #8 2024-11-19 20:13:31 ~33 min tests 📄log
✔️ 1cddb3a #9 2024-11-20 09:06:22 ~3 min macos 📦zip
✔️ 1cddb3a #9 2024-11-20 09:06:41 ~3 min windows 📦zip
✔️ 1cddb3a #9 2024-11-20 09:07:01 ~4 min ios 📦zip
✔️ 1cddb3a #9 2024-11-20 09:07:09 ~4 min tests-rpc 📄log
✔️ 1cddb3a #9 2024-11-20 09:07:19 ~4 min linux 📦zip
✔️ 1cddb3a #9 2024-11-20 09:07:55 ~5 min macos 📦zip
✔️ 1cddb3a #9 2024-11-20 09:09:19 ~6 min android 📦aar
✔️ 1cddb3a #9 2024-11-20 09:36:33 ~33 min tests 📄log
✔️ 5acde49 #10 2024-11-20 09:18:18 ~3 min macos 📦zip
✔️ 5acde49 #10 2024-11-20 09:18:37 ~3 min windows 📦zip
✔️ 5acde49 #10 2024-11-20 09:19:14 ~4 min linux 📦zip
✔️ 5acde49 #10 2024-11-20 09:19:14 ~4 min ios 📦zip
✔️ 5acde49 #10 2024-11-20 09:19:19 ~4 min tests-rpc 📄log
✔️ 5acde49 #10 2024-11-20 09:19:54 ~5 min macos 📦zip
✔️ 5acde49 #10 2024-11-20 09:21:16 ~6 min android 📦aar
Commit #️⃣ Finished (UTC) Duration Platform Result
✔️ 72d55a2 #11 2024-11-20 09:22:07 ~3 min macos 📦zip
✔️ 72d55a2 #11 2024-11-20 09:22:29 ~3 min windows 📦zip
✔️ 72d55a2 #11 2024-11-20 09:23:53 ~4 min tests-rpc 📄log
✔️ 72d55a2 #11 2024-11-20 09:23:54 ~4 min ios 📦zip
✔️ 72d55a2 #11 2024-11-20 09:23:59 ~4 min linux 📦zip
✔️ 72d55a2 #11 2024-11-20 09:25:26 ~5 min macos 📦zip
✔️ 72d55a2 #11 2024-11-20 09:27:23 ~6 min android 📦aar
✔️ 72d55a2 #10 2024-11-20 10:09:51 ~33 min tests 📄log
✔️ b1846a9 #12 2024-11-20 12:39:36 ~4 min macos 📦zip
✔️ b1846a9 #12 2024-11-20 12:39:44 ~4 min tests-rpc 📄log
✔️ b1846a9 #12 2024-11-20 12:40:03 ~4 min windows 📦zip
✔️ b1846a9 #12 2024-11-20 12:40:12 ~5 min ios 📦zip
✔️ b1846a9 #12 2024-11-20 12:40:20 ~5 min macos 📦zip
✔️ b1846a9 #12 2024-11-20 12:41:26 ~6 min android 📦aar
✔️ b1846a9 #12 2024-11-20 12:41:34 ~6 min linux 📦zip
✔️ b1846a9 #11 2024-11-20 13:08:01 ~32 min tests 📄log

Base automatically changed from feat/add-sentry-dependency to develop November 6, 2024 17:37
Copy link

codecov bot commented Nov 6, 2024

Codecov Report

Attention: Patch coverage is 42.96875% with 73 lines in your changes missing coverage. Please review.

Project coverage is 60.86%. Comparing base (9a45ae0) to head (b1846a9).
Report is 2 commits behind head on develop.

Files with missing lines Patch % Lines
internal/sentry/sentry.go 24.32% 28 Missing ⚠️
api/geth_backend.go 0.00% 15 Missing ⚠️
internal/sentry/params.go 46.66% 8 Missing ⚠️
mobile/status.go 0.00% 8 Missing ⚠️
cmd/status-backend/main.go 0.00% 5 Missing ⚠️
common/utils.go 37.50% 5 Missing ⚠️
internal/sentry/options.go 77.77% 4 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #6054      +/-   ##
===========================================
- Coverage    60.90%   60.86%   -0.04%     
===========================================
  Files          819      823       +4     
  Lines       109444   109569     +125     
===========================================
+ Hits         66656    66694      +38     
- Misses       34950    35050     +100     
+ Partials      7838     7825      -13     
Flag Coverage Δ
functional 13.54% <2.34%> (+<0.01%) ⬆️
unit 60.09% <44.71%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
internal/sentry/stacktrace.go 100.00% <100.00%> (ø)
mobile/callog/status_request_log.go 91.83% <100.00%> (+0.17%) ⬆️
protocol/messenger_backup.go 75.76% <ø> (ø)
protocol/requests/initialize_application.go 0.00% <ø> (ø)
internal/sentry/options.go 77.77% <77.77%> (ø)
cmd/status-backend/main.go 0.00% <0.00%> (ø)
common/utils.go 77.77% <37.50%> (-2.23%) ⬇️
internal/sentry/params.go 46.66% <46.66%> (ø)
mobile/status.go 3.64% <0.00%> (-0.03%) ⬇️
api/geth_backend.go 54.08% <0.00%> (-0.51%) ⬇️
... and 1 more

... and 39 files with indirect coverage changes

@igor-sirotin igor-sirotin changed the title feat_: add panic reporting to Sentry feat_: add Sentry panic reporting Nov 6, 2024
@igor-sirotin igor-sirotin self-assigned this Nov 6, 2024
@ilmotta
Copy link
Contributor

ilmotta commented Nov 8, 2024

@igor-sirotin one point to consider, which @sunleos already mentioned, is the desire to keep usage data separate from everything else. We already have two separate flags, one for Waku usage data (telemetry) and general usage data (analytics/MixPanel). It would be better if we could have a new setting just for app monitoring, that way a user will be able to toggle it independently of the other two. One other reason is that we don't want internal CCs enabling the centralized metrics toggle because that skews MixPanel reports due to the small number of active (external) users.

@igor-sirotin
Copy link
Collaborator Author

igor-sirotin commented Nov 20, 2024

Settings plan

We're discussing to make 3 separate toggles for analytics, but all under a single main toggle as well. Something like this:

  • Share usage data with Status
    • Share usage analytics
    • Share crash reports
    • Share Waku telemetry

What we have

So far we have this controls in the database:

What we need

Most probably we need 4 boolean flags. Might work with 3 as well, depending on what the design will be.

cc @xAlisher @ilmotta

PS. I will keep this PR without these controls and bind to centralizedmetrics_uuid.enabled for now.
And do the settings work in a separate PR.

@igor-sirotin igor-sirotin force-pushed the feat/sentry branch 2 times, most recently from 5acde49 to 72d55a2 Compare November 20, 2024 09:16
@status-im-auto
Copy link
Member

✔️ status-go/prs/macos/x86_64/main/PR-6054#12 🔹 ~5 min 31 sec 🔹 b1846a9 🔹 📦 macos package

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants