-
Notifications
You must be signed in to change notification settings - Fork 170
Issues: pytorch/torchtitan
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Granular layer selection during Pipeline Parallelism
question
Further information is requested
#598
opened Oct 3, 2024 by
bhuvan777
Gradient norm clipping with pipeline parallelism (PP)
bug
Something isn't working
#596
opened Oct 1, 2024 by
zijian-hu
Support Gemma2 in torchtitan
enhancement
New feature or request
#594
opened Oct 1, 2024 by
pansershrek
reproducable numerics for loss, weights and gradients for single node (8 GPUs)
enhancement
New feature or request
#593
opened Oct 1, 2024 by
weifengpy
Inference with the checkpoint
enhancement
New feature or request
#586
opened Sep 23, 2024 by
mathmax12
Support INT8 mixed-precision training from torchao?
enhancement
New feature or request
#578
opened Sep 14, 2024 by
gau-nernst
Wrong train_state.step when resuming from checkpoint for the second time
bug
Something isn't working
#571
opened Sep 8, 2024 by
LeoXinhaoLee
Pipeline Parallelism + FSDP
question
Further information is requested
#562
opened Aug 29, 2024 by
jeromeku
Fail-safe and partial redundancy for HSDP on unreliable compute
enhancement
New feature or request
#561
opened Aug 27, 2024 by
evkogs
PP UX/training confusion re: loss = -1. (need to better document or add auto logging of last rank loss?)
#550
opened Aug 21, 2024 by
lessw2020
2D whole model compile fails at embedding layer
bug
Something isn't working
#534
opened Aug 20, 2024 by
tianyu-l
[rfc] getting rid of seed-checkpoint for Pipeline Parallelism
enhancement
New feature or request
#514
opened Aug 10, 2024 by
wconstab
[Request] Decouple profiler New feature or request
profile_freq
from memory snapshot frequency
enhancement
#475
opened Jul 23, 2024 by
awgu
Only half of parameters are saved when applied PP
bug
Something isn't working
#474
opened Jul 22, 2024 by
dmammfl
[FP8 options] Float8Linear vs TransformerEngine
question
Further information is requested
#462
opened Jul 16, 2024 by
yundai424
Question about custom cuda operators for tensor parallelism
question
Further information is requested
#434
opened Jun 28, 2024 by
vermouth1992
Question about Pipeline parallelism
question
Further information is requested
#431
opened Jun 27, 2024 by
vermouth1992
Llama models with custom configurations and uploading to Hugging Face
enhancement
New feature or request
#420
opened Jun 24, 2024 by
bkchang
DataLoader state is empty for different ranks ?
question
Further information is requested
#409
opened Jun 17, 2024 by
ahatamiz
benchmark perf numbers on H100 GPUs and update performance.md
documentation
Improvements or additions to documentation
Previous Next
ProTip!
Exclude everything labeled
bug
with -label:bug.