Drop Your Feature Requests HERE #239

MahmoudAshraf97 · 2024-10-09T12:01:05Z

Hi Everyone,
I'm planning for a refactor for this repo to make it more modular and ease maintenance especially in w.r.t dependency management
Also since it's hard to find a solution fits all, I'm thinking about making the repo modular so that we can integrate multiple options for each stage (transcription, diarization, etc.)
To sum it up this is the roadmap:

QoL Goals:

Write a GitHub action to frequently verify that all requirements are compatible across different OSes Add a simple action to test requirements weekly #241
Write Unit Tests
Create Docker Image and Docker Compose Interface

Refactor Goals:

Split everything to modules (Preprocessing, Transcription, Diarization, Postprocessing), this will allow integrating more options for each stage such as better vocal separation, different transcription provider, custom postprocessing and such.
Convert this repo to a PyPI package

Happy to hear everyone's thoughts and ideas

transcriptionstream · 2024-10-09T21:56:52Z

Fantastic idea! I'd love to assist in the docker/compose aspect any way I can.

as for feature request, I'd love to see an option to choose/utilize other models than ctc-forced-aligner, if needed, based on license/use

MahmoudAshraf97 · 2024-10-12T15:06:26Z

Can you check #237? I'm not familiar with docker compose, so I don't know what is the functionality needed here

as for ctc-forced-aligner the code itself can be used commercially after changing the default model which can be done by passing the model name to the loading function, alternatives will be nemo forced aligner or Montreal forced aligner

SIlver-- · 2024-12-03T20:39:20Z

Hello. Requesting to have word_timestamps as a command line option included with the output results. I realize adding just the option to the whisper_model.transcribe isn't enough to get the results as they get filtered out during the diarization merging methods that happen afterwards.

In my local version of the project, I included helper functions to line up each word of the formatted_transcript with the word element from of the Iterable[Words] prop from faster_whisper return, but I'm not so sure this is the best way to do it. And how I have it done so far, there's no way to enable/disable this from the Command Line Options so I feel embarrassed 😞 to share my work so far.

MahmoudAshraf97 pinned this issue Oct 9, 2024

MahmoudAshraf97 mentioned this issue Oct 9, 2024

Feature Request: Docker compose install #237

Closed

MahmoudAshraf97 mentioned this issue Oct 24, 2024

Getting word-level timestamps #258

Closed

jrhe mentioned this issue Dec 15, 2024

Setup UV python package manager (excluding explicit pytorch) #285

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Drop Your Feature Requests HERE #239

Drop Your Feature Requests HERE #239

MahmoudAshraf97 commented Oct 9, 2024 •

edited

Loading

transcriptionstream commented Oct 9, 2024

MahmoudAshraf97 commented Oct 12, 2024

SIlver-- commented Dec 3, 2024

Drop Your Feature Requests HERE #239

Drop Your Feature Requests HERE #239

Comments

MahmoudAshraf97 commented Oct 9, 2024 • edited Loading

transcriptionstream commented Oct 9, 2024

MahmoudAshraf97 commented Oct 12, 2024

SIlver-- commented Dec 3, 2024

MahmoudAshraf97 commented Oct 9, 2024 •

edited

Loading