Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revamp submit train script for evals #192

Merged
merged 12 commits into from
Jul 25, 2024
Merged

Revamp submit train script for evals #192

merged 12 commits into from
Jul 25, 2024

Conversation

natolambert
Copy link
Collaborator

@natolambert natolambert commented Jul 12, 2024

We can now use scripts/submit_finetune_jobs.py with the config structure.
How it works:

  1. Loads a beaker config that is mostly filled out
  2. Optionally loads a config from the local directory (e.g. in configs/train_configs/)
  3. Optionally takes in additional command line arguments

The file overrides from the end. Command line arguments have highest priority, and those over the passed config, and those over the default config.

This allows things like running parameter sweeps via bash script via the following (can even put these in loops)

python scripts/submit_finetune_job.py --config=configs/train_configs/sft/default.yaml  --learning_rate 1e-6
python scripts/submit_finetune_job.py --config=configs/train_configs/sft/default.yaml  --learning_rate 4e-6
python scripts/submit_finetune_job.py --config=configs/train_configs/sft/default.yaml  --learning_rate 1e-5
python scripts/submit_finetune_job.py --config=configs/train_configs/sft/default.yaml  --learning_rate 4e-5

Run with

sh scripts/submit_finetune_jobs.sh

@natolambert
Copy link
Collaborator Author

natolambert commented Jul 12, 2024

UPDATE: This is due to the dockerfile! We should maintain a oe-adapt image rather than a hamish one so either to update.

Currently debugging this. Getting a error with the paths (not sure why it's happening / where commands are fun in Beaker jobs)

2024-07-12T22:04:38.702970347Z Traceback (most recent call last):
2024-07-12T22:04:38.703003402Z   File "/stage/open_instruct/finetune.py", line 49, in <module>
2024-07-12T22:04:38.703007590Z     from open_instruct.utils import ArgumentParserPlus, FlatArguments
2024-07-12T22:04:38.703011147Z ModuleNotFoundError: No module named 'open_instruct'
2024-07-12T22:04:38.748507759Z Traceback (most recent call last):
2024-07-12T22:04:38.748545263Z   File "/stage/open_instruct/finetune.py", line 49, in <module>
2024-07-12T22:04:38.748557256Z     from open_instruct.utils import ArgumentParserPlus, FlatArguments
2024-07-12T22:04:38.748561334Z ModuleNotFoundError: No module named 'open_instruct'
2024-07-12T22:04:38.806342385Z Traceback (most recent call last):
2024-07-12T22:04:38.806382674Z   File "/stage/open_instruct/finetune.py", line 49, in <module>
2024-07-12T22:04:38.806388846Z     from open_instruct.utils import ArgumentParserPlus, FlatArguments
2024-07-12T22:04:38.806393084Z ModuleNotFoundError: No module named 'open_instruct'
2024-07-12T22:04:38.874596426Z Traceback (most recent call last):
2024-07-12T22:04:38.874617337Z   File "/stage/open_instruct/finetune.py", line 49, in <module>
2024-07-12T22:04:38.874640672Z     from open_instruct.utils import ArgumentParserPlus, FlatArguments
2024-07-12T22:04:38.874659630Z ModuleNotFoundError: No module named 'open_instruct'

if env['name'] == "WANDB_PROJECT":
env['value'] = wandb_project
d['tasks'][0]['envVars'].append({
'name': 'WANDB_API_KEY', 'value': wandb_api_key
Copy link
Collaborator

@hamishivi hamishivi Jul 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might expose the wandb key in the beaker config, rather than set in the workspace as a secret.... I would prefer we just have a WANDB KEY secret in the workspace, and the workspace owner sets it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah let me fix this and add instructions.



if __name__ == "__main__":
main()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gonna more or less assume this parsing works. would probably be good to test a bit first.

@natolambert
Copy link
Collaborator Author

I've tested parsing a decent bit. Was making sure all the values passed and from config are right, both for values after --arg value, and --store_true keys. Maybe should make sure it works for list keys (very niche case, e.g. log to wandb and tensorboard).

I have some jobs in the queue for this now.

@natolambert natolambert reopened this Jul 18, 2024
@natolambert
Copy link
Collaborator Author

(accidentally clicked close)

@natolambert natolambert merged commit db3fb37 into main Jul 25, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants