-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Attention Performance] Flash Attention performance get to 80%~90% of XeTLA #773
Labels
Milestone
Comments
Dewei-Wang-sh
changed the title
[Attention(Forward) Performance] Attention with typical shape performance up to ~80% of pytorch
[Attention Performance] Attention(forward of typical shape) performance up to 80% of pytorch
Mar 28, 2024
Dewei-Wang-sh
changed the title
[Attention Performance] Attention(forward of typical shape) performance up to 80% of pytorch
[Attention Performance] Attention(forward of typical shape) performance get to 80% of pytorch
Mar 28, 2024
tdeng5
changed the title
[Attention Performance] Attention(forward of typical shape) performance get to 80% of pytorch
[Attention Performance] Flash Attention v2.0 (forward of typical shape) performance get to 80% of XeTLA
Apr 1, 2024
vlad-penkin
changed the title
[Attention Performance] Flash Attention v2.0 (forward of typical shape) performance get to 80% of XeTLA
[Attention Performance] Flash Attention v2.0 (forward of typical shape) performance get to between 80%+ and 90% of XeTLA
Apr 15, 2024
Dewei-Wang-sh
changed the title
[Attention Performance] Flash Attention v2.0 (forward of typical shape) performance get to between 80%+ and 90% of XeTLA
[Attention Performance] Flash Attention v2.0 (1x2x1024x32) performance get to between 80%~90% of XeTLA
Apr 18, 2024
Dewei-Wang-sh
changed the title
[Attention Performance] Flash Attention v2.0 (1x2x1024x32) performance get to between 80%~90% of XeTLA
[Attention Performance] Flash Attention v2.0 (1x2x1024x32) performance get to 80%~90% of XeTLA
Apr 18, 2024
Dewei-Wang-sh
changed the title
[Attention Performance] Flash Attention v2.0 (1x2x1024x32) performance get to 80%~90% of XeTLA
[Attention Performance] Flash Attention performance get to 80%~90% of XeTLA
Apr 28, 2024
For case fwd_4x48x1024x64_false (batch, num_head, n_ctx, dim_head, causal), it can get 60% on GPU Max 1550. |
fixed data mismatch; |
vlad-penkin
modified the milestones:
4.3 [Performance] Tracking,
4.0 [Performance] Core
Aug 17, 2024
need #1102 to close this umbrella issue. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
we aim to get 80%+ of XeTLA
use
python/tutorials/06-fused-attention.py
as the test case.(batch head n_ctx d_head causal) on max1100
for fwd_1x2x1024x32_true, xetla median is 4.7tflops
for fwd_1x2x1024x32_false, xetla median is 4.6tflops
for fwd_4x48x1024x64_true, xetla median is 110tflops
for fwd_4x48x1024x64_false, xetla median is 65tflops
The text was updated successfully, but these errors were encountered: