effects of pre-align stage #5

zwcolin · 2024-08-30T20:09:35Z

Hello,

It's a great work and I enjoyed reading the paper! I think it convincingly improved my understanding about VLM training for a lot :).

In the paper, you mentioned using SFT data for pre-alignment, which I assume is the same amount of data used for SFT in the full model (i.e., stage 3). If this is correct, I'm curious if you have conducted any comparisons where the pre-align stage is removed, and the pre-align stage's data is instead used as additional data in stage 2 (i.e., more data for 1 epoch) or stage 3 (i.e., 2 epochs, assuming stages 1 and 3 use the same data).

Does the pre-align stage show an advantage in these comparisons? I'm particularly interested in the scenario where the pre-align stage is discarded, and an equivalent amount of training time is added to stage 3. Another paper has shown that scaling training time (with the same amount of data) can also improve performance. I'm curious about your thoughts on this.

Thanks!

Chrisding · 2024-08-31T16:05:25Z

Hi @zwcolin , thanks a lot for the great suggestion!
Yes we used the same amount of SFT data in the pre-alignment stage compared to full model.
Your suggestion makes a lot of sense. We do intend to conduct a more comprehensive comparison, including the settings you suggested. We will keep you posted.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

effects of pre-align stage #5

effects of pre-align stage #5

zwcolin commented Aug 30, 2024

Chrisding commented Aug 31, 2024

effects of pre-align stage #5

effects of pre-align stage #5

Comments

zwcolin commented Aug 30, 2024

Chrisding commented Aug 31, 2024