flashinfer-ai / flashinfer Public

Notifications You must be signed in to change notification settings
Fork 115
Star 1.2k

Code
Issues 29
Pull requests 10
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Issues: flashinfer-ai/flashinfer

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

29 Open 78 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

Compilation issue on old cuda versions

#514 opened Oct 3, 2024 by yzh119

[Feat] Suggest of using MappingUtils to compute coordiantes automatically for different warpSize

#512 opened Sep 27, 2024 by yiakwy-xpu-ml-framework-team

Will AOT compilation still be supported after JIT compilation is added?

#510 opened Sep 25, 2024 by danieldk

[feature request]: Support moving num_layers into a kv cache page (or support non-contiguous kv cache)

#506 opened Sep 25, 2024 by reyoung

bug: partial unit tests failed

#479 opened Aug 28, 2024 by zhyncs

failed to dispatch head_dim 96

#455 opened Aug 20, 2024 by ZX-ModelCloud

SingleDecodeWithKVCache meets illegal memory access when setting input tensors to cuda:1 bug

Something isn't working

#452 opened Aug 17, 2024 by jason-huang03

apply_rope_inplace will cause graphbreak due to mutated inputs

#403 opened Jul 28, 2024 by jianc99

[FEAT REQ][CUDA GRAPH] Allow explicit control flag to force enable/disable split KV

#397 opened Jul 26, 2024 by AgrawalAmey

pytorch 2.4 support

#395 opened Jul 25, 2024 by yzh119

1 of 2 tasks

Chunked prefill support

#392 opened Jul 24, 2024 by Juelianqvq

Feature: Flash Attention 3

#369 opened Jul 12, 2024 by zhyncs

CUDA Error: no kernel image is available for execution on the device (209) /tmp/build-via-sdist-nl8se4dx/flashinfer-0.0.4+cu118torch2.2/include/flashinfer/attention/decode.cuh: line 871 at function cudaFuncSetAttribute(kernel, cudaFuncAttributeMaxDynamicSharedMemorySize, smem_size)

#249 opened May 16, 2024 by lucasjinreal

Circular import error when importing built-from-source flashinfer

#248 opened May 15, 2024 by vedantroy

multiple definition of `cuda::__3::pipeline...

#245 opened May 14, 2024 by jpf888

Support MLA (Multi-Head Latent Attention) in DeepSeek-v2

#237 opened May 7, 2024 by yzh119

[LoRA] Roadmap of LoRA operators

#199 opened Apr 8, 2024 by yzh119

3 tasks

Shared-prefix rope issue

#194 opened Apr 1, 2024 by lkc1997

JIT compilation priority: high

#170 opened Mar 11, 2024 by yzh119

stack smashing detected in begin_forward when compiling directly from the repo

#166 opened Mar 8, 2024 by mkrima

[Feature request] Interleaved ROPE support

#151 opened Mar 4, 2024 by guocuimi

Still looking forward to an e2e example!

#149 opened Mar 4, 2024 by ZSL98

[Feature Request] Versatile head dimension

#142 opened Feb 29, 2024 by yzh119

Can I only profile dense layer or attention layer in flashinfer rather than the whole kernel?

#139 opened Feb 27, 2024 by yintao-he

How to use low-bit KV Cache in flashinfer? enhancement

New feature or request

#125 opened Feb 18, 2024 by zhaoyang-star

Previous 1 2 Next

Previous Next

ProTip! Type g p on any issue or pull request to go back to the pull request listing page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly