Release v0.2.0 · pytorch/torchtune

Overview

It’s been awhile since we’ve done a release and we have a ton of cool, new features in the torchtune library including distributed QLoRA support, new models, sample packing, and more! Checkout #new-contributors for an exhaustive list of new contributors to the repo.

Enjoy the new release and happy tuning!

New Features

Here’s some highlights of our new features in v0.2.0.

Recipes

We added support for QLoRA with FSDP2! This means users can now run 70B+ models on multiple GPUs. We provide example configs for Llama2 7B and 70B sizes. Note: this currently requires you to install PyTorch nightlies to access the FSDP2 methods. (#909)
Also by leveraging FSDP2, we see a speed up of 12% tokens/sec and a 3.2x speedup in model init over FSDP1 with LoRA (#855)
We added support for other variants of the Meta-Llama3 recipes including:
- 70B with LoRA (#802)
- 70B full finetune (#993)
- 8B memory-efficient full finetune which saves 46% peak memory over previous version (#990)
We introduce a quantization-aware training (QAT) recipe. Training with QAT shows significant improvement in model quality if you plan on quantizing your model post-training. (#980)
torchtune made updates to the eval recipe including:
- Batched inference for faster eval (#947)
- Support for free generation tasks in EleutherAI Eval Harness (#975)
- Support for custom eval configs (#1055)

Models

Phi-3 Mini-4K-Instruct from Microsoft (#876)
Gemma 7B from Google (#971)
Code Llama2: 7B, 13B, and 70B sizes from Meta (#847)
@salman designed and implemented reward modeling for Mistral models (#840, #991)

Perf, memory, and quantization

We made improvements to our FSDP + Llama3 recipe, resulting in 13% more savings in allocated memory for the 8B model. (#865)
Added Int8 per token dynamic activation + int4 per axis grouped weight (8da4w) quantization (#884)

Data/Datasets

We added support for a widely requested feature - sample packing! This feature drastically speeds up model training - e.g. 2X faster with the alpaca dataset. (#875, #1109)
In addition to our instruct tuning, we now also support continued pretraining and include several example datasets like wikitext and CNN DailyMail. (#868)
Users can now train on multiple datasets using concat datasets (#889)
We now support OpenAI conversation style data (#890)

Miscellaneous

@jeromeku added a much more advanced profiler so users can understand the exact bottlenecks in their LLM training. (#1089)
We made several metric logging improvements:
- Log tokens/sec, per-step logging, configurable memory logging (#831)
- Better formatting for stdout memory logs (#817)
Users can now save models in a safetensor format. (#1096)
Updated activation checkpointing to support selective layer and selective op activation checkpointing (#785)
We worked with the Hugging Face team to provide support for loading adapter weights fine tuned via torchtune directly into the PEFT library. (#933)

Documentation

We wrote a new tutorial for fine-tuning Llama3 with chat data (#823) and revamped the datasets tutorial (#994)
Looooooooong overdue, but we added proper documentation for the tune CLI (#1052)
Improved contributing guide (#896)

Bug Fixes

@Optimox found and fixed a bug to ensure that LoRA dropout was correctly applied (#996)
Fixed a broken link for Llama3 tutorial in #805
Fixed Gemma model generation (#1016)
Bug workaround: to download CNN DailyMail, launch a single device recipe first and once it’s downloaded you can use the dataset for distributed recipes.

New Contributors

@supernovae made their first contribution in #803
@eltociear made their first contribution in #814
@Carolinabanana made their first contribution in #810
@musab-mk made their first contribution in #818
@apthagowda97 made their first contribution in #816
@lessw2020 made their first contribution in #785
@weifengpy made their first contribution in #843
@musabgultekin made their first contribution in #857
@xingyaoww made their first contribution in #890
@vmoens made their first contribution in #902
@andrewor14 made their first contribution in #884
@kunal-mansukhani made their first contribution in #926
@EvilFreelancer made their first contribution in #889
@water-vapor made their first contribution in #950
@Optimox made their first contribution in #995
@tambulkar made their first contribution in #1011
@christobill made their first contribution in #1004
@j-dominguez9 made their first contribution in #1056
@andyl98 made their first contribution in #1061
@hmosousa made their first contribution in #1065
@yasser-sulaiman made their first contribution in #1055
@parthsarthi03 made their first contribution in #1081
@mdeff made their first contribution in #1086
@jeffrey-fong made their first contribution in #1096
@jeromeku made their first contribution in #1089
@man-shar made their first contribution in #1126

Full Changelog: v0.1.1...v0.2.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.2.0