This repository has been archived by the owner on Mar 7, 2024. It is now read-only.
Add Support for Google Cloud Speech-To-Text v2 in mod_google_transcribe #164
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR addresses #149 and offers support for the
v2
version of the Speech-To-Text library whilst still supportingv1
simultaneously. The default behaviour is to use thev1
version of the library where everything works identically to the way it did in the previous version. In order to usev2
the FreeSWITCH variableGOOGLE_SPEECH_CLOUD_SERVICES_VERSION
must be set to the value "v2". Setting it to "v1" or not setting it at all results in the default behaviour.If the variable is used then it is essential to provide a so called recognizer parent path in the
GOOGLE_SPEECH_RECOGNIZER_PARENT
FreeSWITCH variable. Failure to do so will result in a failure to construct theGStreamer
class. Recognizers allow commonly used streaming recognition parameters to be stored in the cloud. These stored values can be overridden with parameters passed at runtime but it is essential to provide a recognizer tov2
streaming recognition invocations. If you happen to have already created a recognizer in your Google Cloud account its id can be passed using theGOOGLE_SPEECH_RECOGNIZER_ID
variable. If this is not set thenmod_google_transcribe
will just use the so called wildcard recognizer id ( the "_" character) and a recognizer will be created on the fly and not stored for future use. Note that even if a persistent recognizer is not required, it is always necessary to provide at least the parent id of the recognizer inGOOGLE_SPEECH_RECOGNIZER_PARENT
, otherwise even the wildcard recognizer cannot be created. This parent id is a path string which consists of the google cloud project id which was used to create the google credentials file used, and a geographical location. For more details about recognizers, see https://cloud.google.com/speech-to-text/v2/docs/recognizersAs long as
GOOGLE_SPEECH_CLOUD_SERVICES_VERSION
is set to "v2" andGOOGLE_SPEECH_RECOGNIZER_PARENT
is also set to a valid recognizer parent id then the "v2" library will be used and calls touuid_google_transcribe
should function as it did previously and any configuration parameters provided at runtime will override anything already defined in a predefined recognizer.Differences between
v1
andv2
v2
. That is to say that it is no longer required to specify this as a parameter. Instead it is taken to be implicit from the model selected. If single utterance behaviour is required then this is supported by theshort
model, for example. To see more details on models see https://cloud.google.com/speech-to-text/v2/docs/streaming-recognize.mod_google_transcribe
forv2
but I didn't manage to stuble across a combination of model, language and location which supports this. See https://stackoverflow.com/questions/76779418/speaker-diarization-is-disabled-even-for-supported-languages-in-google-speech-toThere are sure to be many more differences but these are the main things I found so far.
Some Notes on the Code and Building
To avoid code duplication we placed '
v1
specific code ingoogle_glue_v1.cpp
and thev2
specific stuff ingoogle_glue_v2.cpp
. Generic code used by both libraries now resides ingeneric_google_glue.h
. We use our own docker image to build the drachtio modules but our make file is based on this one:https://github.com/drachtio/docker-drachtio-freeswitch-base/blob/main/files/Makefile.am.extra
In order to compile and link the
v2
stuff we had to add the following lines to thenodist_libfreeswitch_libgoogleapis_la_SOURCES
assignment:If you don't do this, you'll most likely get some problems linking.
That's all I can think of for now. It would be really great if you also find this useful and we manage to get it merged. I am of course available for questions.