Skip to content

🎉FA2/HGEMM SMEM Swizzle

Latest
Compare
Choose a tag to compare
@DefTruth DefTruth released this 25 Dec 05:52
· 2 commits to main since this release
bdd361a

What's Changed

  • [FA2] split-q + tiling-qk D=512 performance🎉 by @DefTruth in #178
  • [FA2] split-q + tiling-qk D=512 performance🎉 by @DefTruth in #179
  • [FA2] split-q + tiling-qk D=512 performance🎉 by @DefTruth in #180
  • [Doc] Refactor README.md to improve readability✔️ by @DefTruth in #181
  • [Doc] Refactor README.md for better readability✔️ by @DefTruth in #182
  • [FA2] flash-attn-mma 3080/L20/4090 bench✔️ by @DefTruth in #183
  • [FA2] flash-attn-mma 3080/L20/4090 bench✔️ by @DefTruth in #184
  • [FA2] fa2/hgemm manually smem swizzle🎉 by @DefTruth in #185

flash_attn_mma_stages_split_q_tiling_qk_swizzle_kernel

void flash_attn_mma_stages_split_q_tiling_qk_swizzle_kernel<512, 16, 8, 16, 8, 1, 8, 1, 1, 16, 1, 64, 2, 0, 0, 8>(__half *, __half *, __half *, __half *, int, int) (8, 48, 1)x(256, 1, 1), Context 1, Stream 7, Device 0, CC 8.9
    Section: Command line profiler metrics
    ------------------------------------------------------------------ ----------- ------------
    Metric Name                                                        Metric Unit Metric Value
    ------------------------------------------------------------------ ----------- ------------
    sm__sass_l1tex_data_bank_conflicts_pipe_lsu_mem_shared_op_ldsm.avg                        0
    sm__sass_l1tex_data_bank_conflicts_pipe_lsu_mem_shared_op_ldsm.max                        0
    sm__sass_l1tex_data_bank_conflicts_pipe_lsu_mem_shared_op_ldsm.min                        0
    sm__sass_l1tex_data_bank_conflicts_pipe_lsu_mem_shared_op_ldsm.sum                        0
    ------------------------------------------------------------------ ----------- ------------

Full Changelog: v2.6.11...v2.6.12