Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Re-opened Bug]: Rash of ANRs coming from OneSignal SDK #2230

Closed
1 task done
isles1217 opened this issue Dec 9, 2024 · 13 comments
Closed
1 task done

[Re-opened Bug]: Rash of ANRs coming from OneSignal SDK #2230

isles1217 opened this issue Dec 9, 2024 · 13 comments

Comments

@isles1217
Copy link

isles1217 commented Dec 9, 2024

What happened?

Reopened Issue (#2176)
Based on the ANR analytics we're getting from Firebase Crashlytics, 18 of our top 25 ANRs are being attributed to OneSignal. This is putting our application in danger of crossing the "good-behavior" ANR threshold. Please don't dismiss this as something you can't reproduce (as I'm seeing in several other ANR-related open issues). I suspect there is some kind of systemic pattern of handling long-running or asynchronous operations in a way that's likely impacting the apps that use your SDK.

In the last 90 days, One Signal has gotten attributed to over 1k different ANRs (this is just counting our top 25 ANR sources).

Steps to reproduce?

Monitor the ANR detection of a real production app, start seeing the ANRs.

What did you expect to happen?

I would expect that this SDK would handle async and long running operations without causing ANRs.

OneSignal Android SDK version

5.1.25

Android version

15, 14, 13, 12, 11

Specific Android models

All kinds of devices and models, lots of Samsungs and Motorolas. Mostly seeing Android 11-14, but that's likely indicative of normal distribution of OS's.

Relevant log output

Method calls that are being attributed to One Signal ANRs (in order of most occurring to least):

com.onesignal.common.modeling.Model.initializeFromJson (6 different permutations of this) 

com.onesignal.common.modeling.Model.initializeFromJson(Model.kt:98)
       at com.onesignal.core.internal.operations.impl.OperationModelStore.create(OperationModelStore.kt:68)
       at com.onesignal.core.internal.operations.impl.OperationModelStore.create(OperationModelStore.kt:30)
       at com.onesignal.common.modeling.ModelStore.load(ModelStore.kt:175)
       at com.onesignal.core.internal.operations.impl.OperationModelStore.loadOperations(OperationModelStore.kt)
       at com.onesignal.core.internal.operations.impl.OperationRepo.loadSavedOperations$com_onesignal_core(OperationRepo.kt:418)
       at com.onesignal.core.internal.operations.impl.OperationRepo$start$1.invokeSuspend(OperationRepo.kt:100)


com.onesignal.common.modeling.Model.getOptAnyProperty (8 different permutations of this)

com.onesignal.common.modeling.ModelStore.load (3 different permutations of this)

       at java.util.Collections$SynchronizedMap.get(Collections.java:2738)
       at com.onesignal.common.modeling.Model.getOptAnyProperty(Model.kt:648)
       at com.onesignal.common.modeling.Model.getOptStringProperty(Model.kt:565)
       at com.onesignal.common.modeling.Model.getStringProperty(Model.kt:491)
       at com.onesignal.common.modeling.Model.getStringProperty$default(Model.kt:488)
       at com.onesignal.common.modeling.Model.getId(Model.kt:59)
       at com.onesignal.common.modeling.ModelStore.load(ModelStore.kt:184)
       at com.onesignal.core.internal.operations.impl.OperationModelStore.loadOperations(OperationModelStore.kt)
       at com.onesignal.core.internal.operations.impl.OperationRepo.loadSavedOperations$com_onesignal_core(OperationRepo.kt:418)
       at com.onesignal.core.internal.operations.impl.OperationRepo$start$1.invokeSuspend(OperationRepo.kt:100)


com.onesignal.common.modeling.Model.getStringProperty

       at com.onesignal.common.modeling.Model.getStringProperty(Model.kt)
       at com.onesignal.common.modeling.Model.getStringProperty$default(Model.kt)
       at com.onesignal.common.modeling.Model.getId(Model.kt)
       at com.onesignal.common.modeling.ModelStore.load(ModelStore.kt)
       at com.onesignal.core.internal.operations.impl.OperationModelStore.loadOperations(OperationModelStore.kt)
       at com.onesignal.core.internal.operations.impl.OperationRepo.loadSavedOperations$com_onesignal_core(OperationRepo.kt)
       at com.onesignal.core.internal.operations.impl.OperationRepo$start$1.invokeSuspend(OperationRepo.kt)

Code of Conduct

  • I agree to follow this project's Code of Conduct
@isles1217
Copy link
Author

I recall looking into some recent One Signal PRs around the time I originally opened #2176 , and I remember seeing someone discouraging the developer from using kotlin coroutines, and instead recommending they use Threads instead. If this is still the case, I suspect this is single-handedly the culprit. Coroutines don't block threads, and should be used instead of manually instantiating Threads.

FWIW OneSignal is single-handedly putting our app past the bad-behavior ANR threshold on Google Play. Please take this issue seriously.

@jkasten2
Copy link
Member

jkasten2 commented Dec 9, 2024

@isles1217 Thanks for opening this issue.

The stack traces you listed run on a background thread, could you share the main thread parts of the ANR? What the main thread is waiting on is key since an ANR only happens if the main thread is waiting for more than 5 seconds. If main thread is waiting on something from OneSignal then OneSignal could be the cause of the ANR, otherwise background threads shown in the ANR normally aren't the cause of it.

@isles1217
Copy link
Author

I'm not sure how to best answer here, as I'm not sure how to surmise which segment of these OneSignal operations are happening on the main thread vs which ones aren't. According to Firebase, the main thread is getting blocked here, potentially because of a lock on an ArrayList/list? Is there a Synchronized / Volatile lock on a mutable collection or read/write operation, maybe?

getOptAnyProperty:

Screenshot 2024-12-09 at 15 13 14

initializeFromJson:

Screenshot 2024-12-09 at 15 19 36

@jkasten2
Copy link
Member

jkasten2 commented Dec 9, 2024

@isles1217 Yes this is what we are looking for!

Could you share both those full stack traces as text?

  • Both the (runnable) and the (blocked) pairs from a single ANR.

@isles1217
Copy link
Author

Sure thing!

getOptAnyProperty:

runnable:

          OpRepo (runnable):tid=62 systid=27581 
       at java.util.LinkedHashMap.get(LinkedHashMap.java:554)
       at java.util.Collections$SynchronizedMap.get(Collections.java:2956)
       at com.onesignal.common.modeling.Model.getOptAnyProperty(Model.kt:648)
       at com.onesignal.common.modeling.Model.getOptStringProperty(Model.kt:565)
       at com.onesignal.common.modeling.Model.getStringProperty(Model.kt:491)
       at com.onesignal.common.modeling.Model.getStringProperty$default(Model.kt:488)
       at com.onesignal.common.modeling.Model.getId(Model.kt:59)
       at com.onesignal.common.modeling.ModelStore.load(ModelStore.kt:184)
       at com.onesignal.core.internal.operations.impl.OperationModelStore.loadOperations(OperationModelStore.kt)
       at com.onesignal.core.internal.operations.impl.OperationRepo.loadSavedOperations$com_onesignal_core(OperationRepo.kt:418)
       at com.onesignal.core.internal.operations.impl.OperationRepo$start$1.invokeSuspend(OperationRepo.kt:100)
       at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
       at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:101)
       at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:487)
       at java.util.concurrent.FutureTask.run(FutureTask.java:264)
       at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:307)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:644)
       at java.lang.Thread.run(Thread.java:1012)

blocked:

          main (blocked):tid=1 systid=27435 | waiting to lock <0x0016e670> (java.util.ArrayList) held by thread 62
       at com.onesignal.common.modeling.ModelStore.add(ModelStore.kt:45)
       at com.onesignal.common.modeling.IModelStore$DefaultImpls.add$default(IModelStore.kt:36)
       at com.onesignal.core.internal.operations.impl.OperationRepo.internalEnqueue(OperationRepo.kt:153)
       at com.onesignal.core.internal.operations.impl.OperationRepo.internalEnqueue$default(OperationRepo.kt:133)
       at com.onesignal.core.internal.operations.impl.OperationRepo.enqueue(OperationRepo.kt:112)
       at com.onesignal.core.internal.operations.IOperationRepo$DefaultImpls.enqueue$default(IOperationRepo.kt:18)
       at com.onesignal.session.internal.session.impl.SessionListener.onSessionEnded(SessionListener.kt:57)
       at com.onesignal.session.internal.session.impl.SessionService$endSession$1.invoke(SessionService.kt:77)
       at com.onesignal.session.internal.session.impl.SessionService$endSession$1.invoke(SessionService.kt:77)
       at com.onesignal.common.events.EventProducer.fire(EventProducer.kt:50)
       at com.onesignal.session.internal.session.impl.SessionService.endSession(SessionService.kt:77)
       at com.onesignal.session.internal.session.impl.SessionService.onFocus(SessionService.kt:96)
       at com.onesignal.core.internal.application.impl.ApplicationService$handleFocus$1.invoke(ApplicationService.kt:395)
       at com.onesignal.core.internal.application.impl.ApplicationService$handleFocus$1.invoke(ApplicationService.kt:395)
       at com.onesignal.common.events.EventProducer.fire(EventProducer.kt:50)
       at com.onesignal.core.internal.application.impl.ApplicationService.handleFocus(ApplicationService.kt:395)
       at com.onesignal.core.internal.application.impl.ApplicationService.onActivityStarted(ApplicationService.kt:163)
       at android.app.Application.dispatchActivityStarted(Application.java:386)
       at android.app.Activity.dispatchActivityStarted(Activity.java:1417)
       at android.app.Activity.onStart(Activity.java:1929)
       at android.app.Instrumentation.callActivityOnStart(Instrumentation.java:1457)
       at android.app.Activity.performStart(Activity.java:8259)
       at android.app.ActivityThread.handleStartActivity(ActivityThread.java:3776)
       at android.app.servertransaction.TransactionExecutor.performLifecycleSequence(TransactionExecutor.java:221)
       at android.app.servertransaction.TransactionExecutor.cycleToPath(TransactionExecutor.java:201)
       at android.app.servertransaction.TransactionExecutor.executeLifecycleState(TransactionExecutor.java:173)
       at android.app.servertransaction.TransactionExecutor.execute(TransactionExecutor.java:97)
       at android.app.ActivityThread$H.handleMessage(ActivityThread.java:2311)
       at android.os.Handler.dispatchMessage(Handler.java:111)
       at android.os.Looper.loopOnce(Looper.java:238)
       at android.os.Looper.loop(Looper.java:357)
       at android.app.ActivityThread.main(ActivityThread.java:8090)
       at java.lang.reflect.Method.invoke(Native method)
       at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:548)
       at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1026)

@isles1217
Copy link
Author

initializeFromJson:

runnable:

          OpRepo (runnable):tid=49 systid=25362 
#00 pc 0x4fdd84 libart.so (art::DumpNativeStack + 108) (BuildId: 3f7d5a016e08d528f129bdd336d81168)
#01 pc 0x50043c libart.so (art::Thread::DumpStack const + 376) (BuildId: 3f7d5a016e08d528f129bdd336d81168)
#02 pc 0x502230 libart.so (art::DumpCheckpoint::Run + 216) (BuildId: 3f7d5a016e08d528f129bdd336d81168)
#03 pc 0x22c234 libart.so (art::Thread::RunCheckpointFunction + 144) (BuildId: 3f7d5a016e08d528f129bdd336d81168)
#04 pc 0x990914 libart.so (artTestSuspendFromCode + 152) (BuildId: 3f7d5a016e08d528f129bdd336d81168)
#05 pc 0x383d9c libart.so (art_quick_test_suspend + 156) (BuildId: 3f7d5a016e08d528f129bdd336d81168)
       at java.lang.reflect.Executable.getMethodNameInternal(Native method)
       at java.lang.reflect.Method.getName(Method.java:115)
       at java.lang.reflect.Method$1.compare(Method.java:72)
       at java.lang.reflect.Method$1.compare(Method.java:67)
       at libcore.util.CollectionUtils.removeDuplicates(CollectionUtils.java:89)
       at java.lang.Class.getMethods(Class.java:2226)
       at com.onesignal.common.modeling.Model.initializeFromJson(Model.kt:98)
       at com.onesignal.core.internal.operations.impl.OperationModelStore.create(OperationModelStore.kt:67)
       at com.onesignal.core.internal.operations.impl.OperationModelStore.create(OperationModelStore.kt:30)
       at com.onesignal.common.modeling.ModelStore.load(ModelStore.kt:175)
       at com.onesignal.core.internal.operations.impl.OperationModelStore.loadOperations(OperationModelStore.kt)
       at com.onesignal.core.internal.operations.impl.OperationRepo.loadSavedOperations$com_onesignal_core(OperationRepo.kt:418)
       at com.onesignal.core.internal.operations.impl.OperationRepo$start$1.invokeSuspend(OperationRepo.kt:100)
       at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
       at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:101)
       at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:487)
       at java.util.concurrent.FutureTask.run(FutureTask.java:264)
       at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:307)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:644)
       at java.lang.Thread.run(Thread.java:1012)
        

blocked:

          main (blocked):tid=1 systid=25298 | waiting to lock <0x0581cd00> (java.util.ArrayList) held by thread 49
       at com.onesignal.common.modeling.ModelStore.add(ModelStore.kt:45)
       at com.onesignal.common.modeling.IModelStore$DefaultImpls.add$default(IModelStore.kt:36)
       at com.onesignal.core.internal.operations.impl.OperationRepo.internalEnqueue(OperationRepo.kt:153)
       at com.onesignal.core.internal.operations.impl.OperationRepo.internalEnqueue$default(OperationRepo.kt:133)
       at com.onesignal.core.internal.operations.impl.OperationRepo.enqueue(OperationRepo.kt:112)
       at com.onesignal.session.internal.session.impl.SessionListener.onSessionStarted(SessionListener.kt:43)
       at com.onesignal.session.internal.session.impl.SessionService$onFocus$1.invoke(SessionService.kt:107)
       at com.onesignal.session.internal.session.impl.SessionService$onFocus$1.invoke(SessionService.kt:107)
       at com.onesignal.common.events.EventProducer.fire(EventProducer.kt:50)
       at com.onesignal.session.internal.session.impl.SessionService.onFocus(SessionService.kt:107)
       at com.onesignal.core.internal.application.impl.ApplicationService$handleFocus$1.invoke(ApplicationService.kt:395)
       at com.onesignal.core.internal.application.impl.ApplicationService$handleFocus$1.invoke(ApplicationService.kt:395)
       at com.onesignal.common.events.EventProducer.fire(EventProducer.kt:50)
       at com.onesignal.core.internal.application.impl.ApplicationService.handleFocus(ApplicationService.kt:395)
       at com.onesignal.core.internal.application.impl.ApplicationService.onActivityStarted(ApplicationService.kt:163)
       at android.app.Application.dispatchActivityStarted(Application.java:386)
       at android.app.Activity.dispatchActivityStarted(Activity.java:1417)
       at android.app.Activity.onStart(Activity.java:1929)
       at android.app.Instrumentation.callActivityOnStart(Instrumentation.java:1457)
       at android.app.Activity.performStart(Activity.java:8259)
       at android.app.ActivityThread.handleStartActivity(ActivityThread.java:3776)
       at android.app.servertransaction.TransactionExecutor.performLifecycleSequence(TransactionExecutor.java:221)
       at android.app.servertransaction.TransactionExecutor.cycleToPath(TransactionExecutor.java:201)
       at android.app.servertransaction.TransactionExecutor.executeLifecycleState(TransactionExecutor.java:173)
       at android.app.servertransaction.TransactionExecutor.execute(TransactionExecutor.java:97)
       at android.app.ActivityThread$H.handleMessage(ActivityThread.java:2311)
       at android.os.Handler.dispatchMessage(Handler.java:111)
       at android.os.Looper.loopOnce(Looper.java:238)
       at android.os.Looper.loop(Looper.java:357)
       at android.app.ActivityThread.main(ActivityThread.java:8090)
       at java.lang.reflect.Method.invoke(Native method)
       at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:548)
       at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1026)

@jinliu9508
Copy link
Contributor

Hello @isles1217, from these two lines, it seems the code is running on version 5.1.18 or earlier.

com.onesignal.common.modeling.Model.initializeFromJson(Model.kt:98)
at com.onesignal.core.internal.operations.impl.OperationModelStore.create(OperationModelStore.kt:68)

It's possible that the reason you are still encountering this ANR is because many users are still using an older version of your app that runs on OneSignal SDK 5.1.18 or earlier. Could you filter the ANRs by your app version and show us the result?

#2230 (comment) This looks like a deadlock to me. I will try to find the root cause of it and get back to you. Thank you for providing all the info!

@isles1217
Copy link
Author

It's possible that the reason you are still encountering this ANR is because many users are still using an older version of your app that runs on OneSignal SDK 5.1.18 or earlier. Could you filter the ANRs by your app version and show us the result?

Hmm... I'm seeing those ANR on our newest releases, and we've had production releases on versions >= 5.1.20 since September. Let me take a look at my dependency tree in case another library is forcing an old version.

@isles1217
Copy link
Author

I've confirmed that we're building with 5.1.25, and the stack traces passed along in my previous comments were from devices running our newest release, with 5.1.15.

@jdonlan
Copy link

jdonlan commented Dec 12, 2024

Any update on this? we're hoping to get this fixed and out before the holiday if at all possible.

@jinliu9508
Copy link
Contributor

We have identified the problem and are actively working on a fix. I will keep you updated as soon as it is available.

@jinliu9508
Copy link
Contributor

@jdonlan @isles1217
A new release, version 5.1.26, is now available and includes a fix for this issue. Please update to our latest Android SDK and let us know if the problem persists. Thank you again for all the information you provided. It was instrumental in helping us troubleshoot and resolve the issue!

@isles1217
Copy link
Author

Thank you @jinliu9508 ! I've been observing this in the wild for the last week or so, and we're seeing our ANRs go down by about 50%, so THANK YOU! There's one particular stack trace that's still producing a lot of ANRs, so I opened a new issue to track that particular flow: #2236

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants