Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding asan sanitizer support for hip #795

Open
wants to merge 28 commits into
base: develop
Choose a base branch
from

Conversation

kab163
Copy link
Contributor

@kab163 kab163 commented Nov 23, 2022

Cmake command on rzvernal:

cmake -DROCM_ROOT_DIR=/opt/rocm-5.3.0 -DHIP_ROOT_DIR=/opt/rocm-5.3.0/hip -DHIP_PATH=/opt/rocm-5.3.0/llvm/bin -DENABLE_HIP=On -DENABLE_OPENMP=Off -DENABLE_CUDA=Off -DUMPIRE_ENABLE_ASAN=On -DCMAKE_CXX_COMPILER=/opt/rocm-5.3.0/llvm/bin/amdclang++ -DCMAKE_C_COMPILER=/opt/rocm-5.3.0/llvm/bin/amdclang -DCMAKE_CXX_FLAGS="-fsanitize=address" -DUMPIRE_ENABLE_SANITIZER_TESTS=ON -DCMAKE_HIP_ARCHITECTURES=gfx90a:xnack+ ../

Warning while compiling: clang-15: warning: ignoring '-fsanitize=address' option for offload arch 'gfx90a' as it is not currently supported there. Use it with an offload arch containing 'xnack+' instead [-Woption-ignored]

Output of sanitizer_tests.cpp

[belcher6@rzvernal11:build]$ ./bin/sanitizer_tests QuickPool write
data[INDEX] = 100
=================================================================
==79729==ERROR: AddressSanitizer: use-after-poison on address 0x154d16201530 at pc 0x0000004f5eb4 bp 0x7fffffff8b10 sp 0x7fffffff8b08
WRITE of size 8 at 0x154d16201530 thread T0
    #0 0x4f5eb3 in test_write_after_free() (/g/g0/belcher6/Umpire/build/bin/sanitizer_tests+0x4f5eb3)
    #1 0x4f67cb in main (/g/g0/belcher6/Umpire/build/bin/sanitizer_tests+0x4f67cb)
    #2 0x15555267acf2 in __libc_start_main (/lib64/libc.so.6+0x3acf2) (BuildId: 64aa558dcdda2d8b0d7b04cef33ddbb2d9d8b8b4)
    #3 0x404e4d in _start (/g/g0/belcher6/Umpire/build/bin/sanitizer_tests+0x404e4d)

Address 0x154d16201530 is a wild pointer inside of access range of size 0x000000000008.
SUMMARY: AddressSanitizer: use-after-poison (/g/g0/belcher6/Umpire/build/bin/sanitizer_tests+0x4f5eb3) in test_write_after_free()
Shadow bytes around the buggy address:
  0x02aa22c38250: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
  0x02aa22c38260: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
  0x02aa22c38270: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
  0x02aa22c38280: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
  0x02aa22c38290: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
=>0x02aa22c382a0: f7 f7 f7 f7 f7 f7[f7]f7 f7 f7 f7 f7 f7 f7 f7 f7
  0x02aa22c382b0: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
  0x02aa22c382c0: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
  0x02aa22c382d0: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
  0x02aa22c382e0: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
  0x02aa22c382f0: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==79729==ABORTING
[belcher6@rzvernal11:build]$ ./bin/sanitizer_tests QuickPool read
data[INDEX] = 100
=================================================================
==79736==ERROR: AddressSanitizer: use-after-poison on address 0x154d16200800 at pc 0x0000004f5c22 bp 0x7fffffff8b30 sp 0x7fffffff8b28
READ of size 8 at 0x154d16200800 thread T0
    #0 0x4f5c21 in test_read_after_free() (/g/g0/belcher6/Umpire/build/bin/sanitizer_tests+0x4f5c21)
    #1 0x4f679e in main (/g/g0/belcher6/Umpire/build/bin/sanitizer_tests+0x4f679e)
    #2 0x15555267acf2 in __libc_start_main (/lib64/libc.so.6+0x3acf2) (BuildId: 64aa558dcdda2d8b0d7b04cef33ddbb2d9d8b8b4)
    #3 0x404e4d in _start (/g/g0/belcher6/Umpire/build/bin/sanitizer_tests+0x404e4d)

Address 0x154d16200800 is a wild pointer inside of access range of size 0x000000000008.
SUMMARY: AddressSanitizer: use-after-poison (/g/g0/belcher6/Umpire/build/bin/sanitizer_tests+0x4f5c21) in test_read_after_free()
Shadow bytes around the buggy address:
  0x02aa22c380b0: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
  0x02aa22c380c0: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
  0x02aa22c380d0: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
  0x02aa22c380e0: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
  0x02aa22c380f0: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
=>0x02aa22c38100:[f7]f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
  0x02aa22c38110: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
  0x02aa22c38120: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
  0x02aa22c38130: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
  0x02aa22c38140: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
  0x02aa22c38150: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==79736==ABORTING

@kab163
Copy link
Contributor Author

kab163 commented Dec 6, 2022

Updated cmake command:

cmake -DROCM_ROOT_DIR=/opt/rocm-5.4.0 -DHIP_ROOT_DIR=/opt/rocm-5.4.0/hip -DHIP_PATH=/opt/rocm-5.4.0/llvm/bin -DENABLE_HIP=On -DENABLE_OPENMP=Off -DENABLE_CUDA=Off -DUMPIRE_ENABLE_ASAN=On -DCMAKE_CXX_COMPILER=/opt/rocm-5.4.0/llvm/bin/amdclang++ -DCMAKE_C_COMPILER=/opt/rocm-5.4.0/llvm/bin/amdclang -DCMAKE_CXX_FLAGS="-fsanitize=address" -DUMPIRE_ENABLE_SANITIZER_TESTS=ON -DCMAKE_HIP_ARCHITECTURES="gfx90a:xnack+" -DGPU_TARGETS="gfx90a:xnack+" -DAMDGPU_TARGETS="gfx90a:xnack+" ../

Updated Output (for hip kernel sanitizer test):

bash-4.4$ ./bin/sanitizer_tests QuickPool read
AddressSanitizer:DEADLYSIGNAL
=================================================================
==333094==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000020 (pc 0x155553a4b9cd bp 0x7fffffff7ab0 sp 0x7fffffff7a50 T0)
==333094==The signal is caused by a READ memory access.
==333094==Hint: address points to the zero page.
    #0 0x155553a4b9cd  (/opt/rocm-5.4.0/lib/libamdhip64.so.5+0xd59cd) (BuildId: a4e47799322466785f920489031c315137965c08)
    #1 0x155553a4bb26  (/opt/rocm-5.4.0/lib/libamdhip64.so.5+0xd5b26) (BuildId: a4e47799322466785f920489031c315137965c08)
    #2 0x155553a4c9fe  (/opt/rocm-5.4.0/lib/libamdhip64.so.5+0xd69fe) (BuildId: a4e47799322466785f920489031c315137965c08)
    #3 0x155553a0a5d6  (/opt/rocm-5.4.0/lib/libamdhip64.so.5+0x945d6) (BuildId: a4e47799322466785f920489031c315137965c08)
    #4 0x155553b5de15  (/opt/rocm-5.4.0/lib/libamdhip64.so.5+0x1e7e15) (BuildId: a4e47799322466785f920489031c315137965c08)
    #5 0x155553b39881  (/opt/rocm-5.4.0/lib/libamdhip64.so.5+0x1c3881) (BuildId: a4e47799322466785f920489031c315137965c08)
    #6 0x155553b3af93 in hipLaunchKernel (/opt/rocm-5.4.0/lib/libamdhip64.so.5+0x1c4f93) (BuildId: a4e47799322466785f920489031c315137965c08)
    #7 0x543a21 in __device_stub__test_for_hip(double**, unsigned long) (/g/g0/belcher6/Umpire/build/bin/sanitizer_tests+0x543a21)
    #8 0x544cdf in main (/g/g0/belcher6/Umpire/build/bin/sanitizer_tests+0x544cdf)
    #9 0x15555267dcf2 in __libc_start_main (/lib64/libc.so.6+0x3acf2) (BuildId: 64aa558dcdda2d8b0d7b04cef33ddbb2d9d8b8b4)
    #10 0x45285d in _start (/g/g0/belcher6/Umpire/build/bin/sanitizer_tests+0x45285d)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/opt/rocm-5.4.0/lib/libamdhip64.so.5+0xd59cd) (BuildId: a4e47799322466785f920489031c315137965c08)
==333094==ABORTING

Note: Will not work if HSA_XNACK is set to 1, do not export... explanation TBD

@kab163
Copy link
Contributor Author

kab163 commented Dec 7, 2022

Issue: it seems like running the test with ./bin/sanitizer_tests QuickPool read HOST and ./bin/sanitizer_tests QuickPool write HOST both give an error that detects a write access....
(Whether it is build with hip support or without it)

@kab163
Copy link
Contributor Author

kab163 commented Dec 8, 2022

The way this test is ran in make test might need to be changed...

@kab163
Copy link
Contributor Author

kab163 commented Dec 21, 2022

See AMD bug report (277) - made reproducer for AMD to look in to. Seems like this does not yet work as expected in rocm 5.4

@kab163
Copy link
Contributor Author

kab163 commented Mar 2, 2023

Update: A fix is supposed to be ready in rocm 6.0

@kab163
Copy link
Contributor Author

kab163 commented Apr 3, 2023

Edited out...

@kab163
Copy link
Contributor Author

kab163 commented Jan 22, 2024

The fix is in rocm/6.0.0, but we have to add a "-Wl,-rpath=/opt/rocm-6.0.0/llvm/lib/clang/17.0.0/lib/linux/" flag. Once we update the corona/tioga jobs to rocm-6.0.0 and get that in, it should be good to go.

@kab163 kab163 marked this pull request as ready for review January 24, 2024 16:10
@kab163
Copy link
Contributor Author

kab163 commented Jan 24, 2024

This has been a long time coming! @adrienbernede - once the blt/camp releases come out and we can update umpire to rocm/6.0.0 we should add this test! (No updates yet for the reason behind why the device allocator test is failing with 6.0.0 - ticket has been "submitted to vendor", so it is being looked into at least...)

@adrienbernede
Copy link
Member

adrienbernede commented Jun 7, 2024

@davidbeckingsale +tools currently conflicts with +rocm. Depending on what you want to achieve, you’ll want to either turn off tools or remove the conflict.

rhornung67
rhornung67 previously approved these changes Jun 7, 2024
@adrienbernede
Copy link
Member

@kab163 @davidbeckingsale now that the allocation is not limiting anymore I’ll let you deal with the rest ;).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants