Checkpoint conversion #758

MaxiBoether · 2024-12-20T17:57:58Z

Hey,

I am trying to evaluate a model trained with torchtitan using the lm eval harness. I am using the VLLM backend. Is there any straightforward way to convert a torchtitan model in the pytorch .pt format to, e.g., a huggingface model to be used in VLLM/lm eval harness? Within the torchtune repo, I was able to find some code for VLMs, but (a) that seems to be hardcoded for LLMs, (b) uses a new inference backend instead of e.g. relying on VLLM, and (c) I feel like there might be an easy way to convert torchtitan checkpoints rather than coming up with such an involved solution.

How did you evaluate downstream task accuracy with torchtitan models?

Thank you very much for your help.

fegin · 2024-12-20T18:44:29Z

When you save PyTorch format, is that full tensor (non-DTensor) and is saved with torch.save()?

MaxiBoether · 2024-12-20T19:20:34Z

I mean what the checkpoint.md docs of torchtitan call dcp_to_torch. So the .pt file generated from that, which is not - at least natively - supported by hf.

jaysonfrancis · 2024-12-22T22:19:08Z

You can take take a look at convert_llama_weights_to_hf.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Checkpoint conversion #758

Checkpoint conversion #758

MaxiBoether commented Dec 20, 2024

fegin commented Dec 20, 2024

MaxiBoether commented Dec 20, 2024 •

edited

Loading

jaysonfrancis commented Dec 22, 2024

Checkpoint conversion #758

Checkpoint conversion #758

Comments

MaxiBoether commented Dec 20, 2024

fegin commented Dec 20, 2024

MaxiBoether commented Dec 20, 2024 • edited Loading

jaysonfrancis commented Dec 22, 2024

MaxiBoether commented Dec 20, 2024 •

edited

Loading