Breaking Change
- No more waiting! The evalution now fully supports batch inference!
- No more environment configs! The code execution is done by a remote API endpoint by default, and can be customized.
- No more multiple commands!
bigcodebench.evaluate
will be good enough to handle most cases.
What's Changed
- add multiprocessing support for sanitization step by @sk-g in #37
- Remove extra period in task BigCodeBench/16 by @hvaara in #38
- Await futures in progress checker by @hvaara in #48
- A few args have been added to this version, including
--direct_completion
and--local_execute
. See Advanced Usage for the details. - The benchmark data has been bumped to
v0.1.2
.
New Contributors
Full Changelog: v0.1.9...v0.2.0