Install HELM: pip install git+https://github.com/stanford-crfm/helm.git
Follow instructions in toy-submission to setup a simple HTTP client that can use to local tests
You can configure which datasets to run HELM on by editing a run_specs.conf
, to run your model on a large set of datasets. For the preliminary evaluation the organizers will use https://github.com/llm-efficiency-challenge/neurips_llm_efficiency_challenge/blob/master/run_specs_full_coarse_600_budget.conf
helm-run --conf-paths run_specs_full_coarse_600_budget.conf --suite v1 --max-eval-instances 10
helm-summarize --suite v1
You can launch a web server to visually inspect the results of your run, helm-summarize
can also print the results textually for you in your terminal but we've found the web server to be useful.
helm-server
This will launch a server on your local host, if you're working on a remote machine you might need to setup port forwarding. If everything worked correctly you should see a page that looks like this