-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[torchbench] hf_Longformer fails to run #453
Comments
@alexbaden these warnings might cause the benchmark to miscompile and then fail at runtime. Do you know how to fix them ?
@ESI-SYD any input on how to fix the benchmark ? |
It seems like |
It might be an environment issue yes. But is fbgemm gpu used if cuda is not present? |
Also fail with v2.1. |
I tried to trace benchmark execution to see if we tried to allocate too much memory. From the trace, it's not clear what the reason for OOM is. The failing call is always the same:
This kernel is previously launched multiple times with the same size with no problems. The total memory allocated using |
About fbgemm you may want to take a look at building fbgemm from sources for CPU. Current binaries of fbgemm cannot be used because they don't link with pytorch that we use. |
Run failure is no longer reproduceable: Env:
|
Appears to be an out of memory issue. The
fbgemm_gpu
undefined symbol messages are fairly common and appear on passing tests.The text was updated successfully, but these errors were encountered: