-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable new models in audio-to-text #163
base: main
Are you sure you want to change the base?
Conversation
logger.info("AudioToTextPipeline using float16 precision for %s", model_id) | ||
kwargs["torch_dtype"] = torch.float16 | ||
|
||
if bfloat16_enabled: | ||
logger.info("AudioToTextPipeline using bfloat16 precision for %s", model_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eliteprox, thanks for the pull request! 🚀 It looks good overall. However, please keep in mind that the default models openai/whisper-large-v3
and distil-whisper/distil-large-v3
use weights in either float16 or bfloat16 formats. The torch_dtype
parameter is primarily for the calculations during runtime. You can verify this by checking the model files in these repositories: Hugging Face - distil-large-v3. Notice the presence of files with the .fp32.safetensors
extension, indicating the format being used.
If the standard .safetensors
(fp16) format meets your needs, you might consider removing the FLOAT16
environment variable and instead switch based on the model extension. This approach was implemented by Yondon in this commit. I will leave that decision to you based on your research 👍🏻. Feel free to merge when you think this pull request is done 🚀.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the tip, I updated the logic to load recommended float values by model. Tested that they download and load correctly
This reverts commit f835dd4.
@rickstaa I made several changes since you last reviewed this PR, so I held off on merging. Could you or @ad-astra-video re-review the latest changes? |
This change adds support for new whisper models
distil-whisper/distil-large-v3
andopenai/whisper-medium
.It also optimizes those models to use the appropriate BFLOAT, FLOAT16 or FLOAT32 values.
Credit to @ad-astra-video for intially exploring these models and optimizations