[Performance] The GEMM performance with the column major B matrix is not as good as row major B matrix. #2354

chengjunlu · 2024-09-26T00:09:55Z

The performance gap is found in #2347

Need to investigate root cause of the performance drops of the column major B matrix case.
Roughly 1.5x worse than the row major B matrix case.

(I): Detected 7680 spills, recompiling the kernel using large GRF mode
(I): Kernel has now 0 spills
✅ Triton and Torch match
Time for torch: 0.31633758544921875 ms
Time for triton: 0.44517597556114197 ms
Compute A x B.T
OpenCL API not available for this operation
OpenCL API not available for this operation
OpenCL API not available for this operation
OpenCL API not available for this operation
(I): Detected 7680 spills, recompiling the kernel using large GRF mode
(I): Kernel has now 0 spills
✅ Triton and Torch match
Time for torch: 0.3375360071659088 ms
Time for triton: 0.6348815560340881 ms

Egor-Krivov · 2024-10-04T08:44:10Z

I think this issue is essential for GEMM perf. Very often weights are stored with K dimensions as the last. Even pytorch linear layer does that: weight torch.Tensor – the learnable weights of the module of shape : (out_features, in_features)

https://pytorch.org/docs/stable/generated/torch.nn.Linear.html

chengjunlu mentioned this issue Sep 26, 2024

Improve GEMM perf when one matrix is transposed #2347

Merged

vlad-penkin added performance enhancement New feature or request labels Sep 27, 2024

vlad-penkin added this to the 4.0 [Performance] Core milestone Sep 27, 2024

Egor-Krivov mentioned this issue Oct 4, 2024

[Benchmarks] Add microbenchmark with A@B^t #2414

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance] The GEMM performance with the column major B matrix is not as good as row major B matrix. #2354

[Performance] The GEMM performance with the column major B matrix is not as good as row major B matrix. #2354

chengjunlu commented Sep 26, 2024

Egor-Krivov commented Oct 4, 2024

[Performance] The GEMM performance with the column major B matrix is not as good as row major B matrix. #2354

[Performance] The GEMM performance with the column major B matrix is not as good as row major B matrix. #2354

Comments

chengjunlu commented Sep 26, 2024

Egor-Krivov commented Oct 4, 2024