Adding XLA as a backend for DeepSpeed #8514

radna0 · 2024-12-21T05:28:41Z

🚀 Feature

Adding XLA as a backend for DeepSpeed.

Motivation

Most frameworks for working with TPUs only utilize multiprocessing at most, such as Accelerate. Even PyTorch/XLA currently only recently supports FSDP, which is not equivalent to pipeline parallelism within DeepSpeed.

This is frustrating, as figuring out a way to enable multi-node SPMD or FSDP with XLA is challenging due to the lack of documentation and examples. Given DeepSpeed's popularity and its advanced features like pipeline parallelism, it is essential to add support for XLA.

There is strong interest in the DeepSpeed community as multiple users have opened issues and commented on having TPU support for DeepSpeed. This integration would fill a significant gap and enable TPU users to fully leverage DeepSpeed's capabilities.

Pitch

I am willing to spearhead this effort and integrate XLA with DeepSpeed, even without external assistance. A PR will be opened as soon as basic tests pass, and this request will be updated with progress accordingly.

Alternatives

PyTorch/XLA + Torchrun
HuggingFace Accelerate
These are viable alternatives but lack the advanced parallelism and optimization capabilities offered by DeepSpeed.

Additional context

The integration of XLA with DeepSpeed could open up exciting possibilities for TPU users, such as easier multi-node support, pipeline parallelism, and other optimizations that DeepSpeed provides.

radna0 mentioned this issue Dec 21, 2024

[REQUEST] Support for XLA/TPU microsoft/DeepSpeed#6901

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding XLA as a backend for DeepSpeed #8514

Adding XLA as a backend for DeepSpeed #8514

radna0 commented Dec 21, 2024

Adding XLA as a backend for DeepSpeed #8514

Adding XLA as a backend for DeepSpeed #8514

Comments

radna0 commented Dec 21, 2024

🚀 Feature

Motivation

Pitch

Alternatives

Additional context