Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-queries Paged attn fails with continuator halted #8459

Draft
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

vanbasten23
Copy link
Collaborator

root@t1v-n-408567d9-w-0:/workspaces/persist#  python pytorch/xla/test/benchmarks/test_paged_attention_benchmark.py --kernel multi-queries-paged-attn-v1 --profile
WARNING:root:libtpu.so and TPU device found. Setting PJRT_DEVICE=TPU.
Warming up...
Traceback (most recent call last):
  File "/workspaces/persist/pytorch/xla/test/benchmarks/test_paged_attention_benchmark.py", line 258, in <module>
    benchmark(args)
  File "/workspaces/persist/pytorch/xla/test/benchmarks/test_paged_attention_benchmark.py", line 239, in benchmark
    run_benchmark(num_iters=10, profile=False)
  File "/workspaces/persist/pytorch/xla/test/benchmarks/test_paged_attention_benchmark.py", line 230, in run_benchmark
    jax.block_until_ready(actual_output)
  File "/usr/local/lib/python3.10/site-packages/jax/_src/api.py", line 2763, in block_until_ready
    try_to_block(arrays[0])
  File "/usr/local/lib/python3.10/site-packages/jax/_src/api.py", line 2746, in try_to_block
    return x.block_until_ready()
jaxlib.xla_extension.XlaRuntimeError: FAILED_PRECONDITION: The program continuator has halted unexpectedly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant