Multibackend tracker #1082

msaroufim · 2024-10-15T17:20:51Z

Today AO official binaries only support NVIDIA GPUs and CPUs but resounding feedback we've gotten since our release has been to support more hardware backends

How to add new backends

We would love to include more backends. In the ideal case the backend is supported via torch.compile already and testing new hardware is mostly a matter of

Donating a runner we could use for our tests
Running our test suite on that runner and adding the necessary skip tests
Opening a tracker with the skipped tests so we can start making the hardware support better

The reason why we like torch.compile is we want to avoid a giant list of if conditions in our codebase. Granted we still have customers for both eager and executorch where working with the compiler is not realistic so in these cases we will insist we implement code via Device agnostic APIs like the ones listed here https://dev-discuss.pytorch.org/t/python-c-api-rules-for-device-generic-apis/2511

One challenge we still need to figure out is the device agnostic APIs are only available on more recent versions of PyTorch whereas in our CI we test many versions of PyTorch

Binary uploads

Note that people can always install AO from source but this makes it inconvenient to use and a lot of the support for more binaries has come from @atalman. The reason why building AO is now hard is because it's no longer a pure python package and will unlikely revert back to that state given how the Executorch and pytorch edge teams are now depending on us to ship their kernels

M1 binary upload Create build-wheels-m1.yml #822
AMD binary upload ROCM binary upload #1099
ARM binary support - this works when doing from source installation see Create build_wheels_aarch64_linux.yml #1083
Windows support for CPU Create build_wheels_windows.yml #1101
Windows support with CUDA since it seems possible to support Triton now https://www.reddit.com/r/LocalLLaMA/comments/1g48oz9/triton_for_windows_unofficial_tested_to_work_in/
Intel XPUs [CI] XPU binary build enable #1105

Leveraging torch.compile

For the most part our performance story is leveraging torch.compile but we should seriously consider having a simple benchmark suitee like the one in pytorch/benchmark to be able to compare different hardware vendors. This is something @HDCharles had already been looking at

For AMD GPUs the story is simple, we leverage torch.compile() and we are unlikely to port our custom ops to HIP so we can get precise estimate of what chunk of our test suite fails
Intel GPUs same as AMD GPUs
ARM CPUs is the most complicated story since there's many competing solutions we'd need to benchmark
- torch.compile CPP codegen
- Custom low bit matmuls like the ones in torchao/experimental
- Triton ARM backend
Metal GPU will only work via eager instead of torch.compile

Test suite coverage

So finally to really say we support hardware backend X, we should be confident in the performance. So the baselines are is our code faster than eager fp16 and somewhat close to the NVIDIA performance for GPUs. We basically need to run our entire test suite and see how many tests fail or are skipped per new backend and manually chase each down.

Test granularity might be too small to report so we can instead look at feature level support like quantize_(), float8, low_bit_optim etc..

AMD Nova GPU support: Add support for Rocm Runners to linux_job test-infra#5766 and Enable ROCM in CI #999 cc @jithunnair-amd
Intel GPU
ARM CPU
Metal GPU

cc @albanD @atalman @EikanWang @jithunnair-amd @supriyar @digantdesai @kimishpatel @metascroy

The text was updated successfully, but these errors were encountered:

…at/ folder (pytorch#1076) * [Hackability Refactor] Move known_model_params under torchchat (pytorch#1073) * [Hackability Refactor] Migrate CLI call sites to explicitly go through torchchat.py (pytorch#1075) * [Hackability Refactor] Move model.py underneath torchchat/ (pytorch#1077) * Move model.py * Clear out init to avoid package circular import * [Hackability Refactor] Move select top level docs into folders within torchchat (pytorch#1080) * [Hackability Refactor] Move the top level util folder into torchchat/utils (pytorch#1079) * [Hackability Refactor] Move the top level util file into torchchat/utils/ * Cleared out init to avoid packing * [Hackability Refactor] Collapse gguf_util into gguf_loader (pytorch#1078) * [Hackability Refactor] Collapse gguf_util into gguf_loader * Update bad import * [Hackability Refactor] Move model_config into torchchat/model_config (pytorch#1082) * [Hackability Refactor] Move cli related files under torchchat/cli (pytorch#1083) * [Hackability Refactor] Move build/util into torchchat/utils (pytorch#1084) * [Hackability Refactor] Easy Moves: eval, gguf_loader, quantize, model_dist (pytorch#1085) * [Hackability Refactor] Easy Cheap Moves: eval, gguf_loader, quantize, model_dist * Update eval.py call sites that slipped through the initial pass * [Hackability Refactor] Update missed direct file calls to use torchchat.py (pytorch#1088) * [Hackability Refactor] Move export and generate under torchchat/ (pytorch#1089) * [Hackability Refactor] Move scripts under torchchat/utils (pytorch#1090) * [Hackability Refactor] Move scripts under torchchat/utils * Fix install script for AOTI * Update referenced path in build_android * Adding missing utils path * Add another layer for torchchat * Move the source command depending on if TC root is defined * [Hackability Refactor] Move installation related files into install/ (pytorch#1081) * [Hackability Refactor] Move installation related files into install/ * Fix install req path * Test fix with install path for bash * Debug messages * Remove changes to install in et_python_libs * Remove debug echo * Fix pin path for et * [Hackability Refactor] Restricted Lint (pytorch#1091) * [Hackability Refactor] Removing __main__ from export/generate/eval (pytorch#1092)

msaroufim mentioned this issue Oct 15, 2024

Create build_wheels_aarch64_linux.yml #1083

Merged

msaroufim added tracker multibackend labels Oct 16, 2024

msaroufim pinned this issue Oct 17, 2024

jerryzh168 mentioned this issue Dec 12, 2024

Does Torch Support NPU Architectures like Ascend MDC910B and Multi-GPU Quantization for Large Models? #1405

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multibackend tracker #1082

Multibackend tracker #1082

msaroufim commented Oct 15, 2024 •

edited

Loading

Multibackend tracker #1082

Multibackend tracker #1082

Comments

msaroufim commented Oct 15, 2024 • edited Loading

How to add new backends

Binary uploads

Leveraging torch.compile

Test suite coverage

msaroufim commented Oct 15, 2024 •

edited

Loading