Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

effects of pre-align stage #5

Open
zwcolin opened this issue Aug 30, 2024 · 1 comment
Open

effects of pre-align stage #5

zwcolin opened this issue Aug 30, 2024 · 1 comment

Comments

@zwcolin
Copy link

zwcolin commented Aug 30, 2024

Hello,

It's a great work and I enjoyed reading the paper! I think it convincingly improved my understanding about VLM training for a lot :).

In the paper, you mentioned using SFT data for pre-alignment, which I assume is the same amount of data used for SFT in the full model (i.e., stage 3). If this is correct, I'm curious if you have conducted any comparisons where the pre-align stage is removed, and the pre-align stage's data is instead used as additional data in stage 2 (i.e., more data for 1 epoch) or stage 3 (i.e., 2 epochs, assuming stages 1 and 3 use the same data).

Does the pre-align stage show an advantage in these comparisons? I'm particularly interested in the scenario where the pre-align stage is discarded, and an equivalent amount of training time is added to stage 3. Another paper has shown that scaling training time (with the same amount of data) can also improve performance. I'm curious about your thoughts on this.

Thanks!

@Chrisding
Copy link
Contributor

Hi @zwcolin , thanks a lot for the great suggestion!
Yes we used the same amount of SFT data in the pre-alignment stage compared to full model.
Your suggestion makes a lot of sense. We do intend to conduct a more comprehensive comparison, including the settings you suggested. We will keep you posted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants