Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop Your Feature Requests HERE #239

Open
1 of 5 tasks
MahmoudAshraf97 opened this issue Oct 9, 2024 · 3 comments
Open
1 of 5 tasks

Drop Your Feature Requests HERE #239

MahmoudAshraf97 opened this issue Oct 9, 2024 · 3 comments

Comments

@MahmoudAshraf97
Copy link
Owner

MahmoudAshraf97 commented Oct 9, 2024

Hi Everyone,
I'm planning for a refactor for this repo to make it more modular and ease maintenance especially in w.r.t dependency management
Also since it's hard to find a solution fits all, I'm thinking about making the repo modular so that we can integrate multiple options for each stage (transcription, diarization, etc.)
To sum it up this is the roadmap:

QoL Goals:

Refactor Goals:

  • Split everything to modules (Preprocessing, Transcription, Diarization, Postprocessing), this will allow integrating more options for each stage such as better vocal separation, different transcription provider, custom postprocessing and such.
  • Convert this repo to a PyPI package

Happy to hear everyone's thoughts and ideas

@transcriptionstream
Copy link
Contributor

Fantastic idea! I'd love to assist in the docker/compose aspect any way I can.

as for feature request, I'd love to see an option to choose/utilize other models than ctc-forced-aligner, if needed, based on license/use

@MahmoudAshraf97
Copy link
Owner Author

Can you check #237? I'm not familiar with docker compose, so I don't know what is the functionality needed here

as for ctc-forced-aligner the code itself can be used commercially after changing the default model which can be done by passing the model name to the loading function, alternatives will be nemo forced aligner or Montreal forced aligner

@SIlver--
Copy link

SIlver-- commented Dec 3, 2024

Hello. Requesting to have word_timestamps as a command line option included with the output results. I realize adding just the option to the whisper_model.transcribe isn't enough to get the results as they get filtered out during the diarization merging methods that happen afterwards.

In my local version of the project, I included helper functions to line up each word of the formatted_transcript with the word element from of the Iterable[Words] prop from faster_whisper return, but I'm not so sure this is the best way to do it. And how I have it done so far, there's no way to enable/disable this from the Command Line Options so I feel embarrassed 😞 to share my work so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants