Google Cloud Speech-to-Text is not entirely accurate, so we have to correct for those errors in our AI software. It uses neural networks, and that stochastic processing is 70% to 75% accurate. It gets it wrong too often, and since I personally work with this, I don't appreciate that. However, they seem to be the best option currently. We have to write our own improvements because their tools to improve transcription accuracy in our domain aren't very powerful. The timestamp technology for recognized words is inadequate, so we don't use it. We understand words based on their meaning, and we have a whole AI engine that does that, which is one of our differentiators from a product standpoint. We didn't use the custom voice creation feature; we just use their voices, which are fine for our purposes.
The major challenge with Google Cloud Speech-to-Text is that not every call is clear. Our representative may be in a silent environment, but the client can be anywhere. We need to manage background noise for all calls, so handling audio file clarity is a challenge. Sometimes, speaker diarization is affected, leading to incorrect speaker identification. For instance, if someone is speaking, it might attribute that incorrectly to another person. These challenges require an AI to analyze the complete output given by Google Cloud Speech-to-Text and correct it properly. The output might have incorrect grammar, especially since many of our clients speak Spanish. Sometimes, the transcription will be in English when it should be in Spanish, and the AI can help correct that as long as we provide the right context. A crucial update would be to autocorrect the output from Google Cloud Speech-to-Text because irrelevant words occasionally appear. In our integration, we handle the output with an AI to correct grammar mistakes and issues. If that feature were integrated into Google Cloud Speech-to-Text, it would remove the need for a third-party process. We could simply call Google Cloud Speech-to-Text without having to format or correct the output separately.
The tool's telephony model does not produce accurate results. With the telephony model, whenever a phone call occurs, we want to transcribe it live or something, and this is not possible in Module Google since it does not support Cloud Translation V2 API. If we are having, like, pre-recorded calls, it is giving you the transcript, but in some cases, where if the call is on hold for five to ten minutes or fifteen minutes or something, whatever we spoke about is not getting recorded, which is an issue.
Google Cloud Speech-to-Text's price could be improved. Google Cloud Speech-to-Text's trial experience could be improved by adding some extra minutes in the trial version.
Director of Research and Regulatory Affairs at SafetySpect Inc
Real User
2023-05-30T16:55:00Z
May 30, 2023
The one thing that I find is when I often use specialized terms, and the solution doesn't know them. I'm not sure if there's an easy way out of this, so I usually have to delete the text to correct something. That would be good if I could just tell it to change the spelling or something like that, and it'd be smart enough to figure that out.
The multilanguage support for the chatbot needs to be better. If you incorporate the translation natively in a chatbot, maybe you can do a chatbot in English and automatically have the same chatbot, however, the same chatbot needs to be in all other languages as the translation services of Google is good. That translation is not bad for technical people. If they can get a multilanguage chatbot, it would be ideal.
Learn what your peers think about Google Cloud Speech-to-Text. Get advice and tips from experienced pros sharing their opinions. Updated: September 2025.
Google Speech-to-Text enables developers to convert audio to text by applying powerful neural network models in an easy-to-use API. The API recognizes 120 languages and variants to support your global user base. You can enable voice command-and-control, transcribe audio from call centers, and more. It can process real-time streaming or prerecorded audio, using Google’s machine learning technology.
Google Cloud Speech-to-Text is not entirely accurate, so we have to correct for those errors in our AI software. It uses neural networks, and that stochastic processing is 70% to 75% accurate. It gets it wrong too often, and since I personally work with this, I don't appreciate that. However, they seem to be the best option currently. We have to write our own improvements because their tools to improve transcription accuracy in our domain aren't very powerful. The timestamp technology for recognized words is inadequate, so we don't use it. We understand words based on their meaning, and we have a whole AI engine that does that, which is one of our differentiators from a product standpoint. We didn't use the custom voice creation feature; we just use their voices, which are fine for our purposes.
The major challenge with Google Cloud Speech-to-Text is that not every call is clear. Our representative may be in a silent environment, but the client can be anywhere. We need to manage background noise for all calls, so handling audio file clarity is a challenge. Sometimes, speaker diarization is affected, leading to incorrect speaker identification. For instance, if someone is speaking, it might attribute that incorrectly to another person. These challenges require an AI to analyze the complete output given by Google Cloud Speech-to-Text and correct it properly. The output might have incorrect grammar, especially since many of our clients speak Spanish. Sometimes, the transcription will be in English when it should be in Spanish, and the AI can help correct that as long as we provide the right context. A crucial update would be to autocorrect the output from Google Cloud Speech-to-Text because irrelevant words occasionally appear. In our integration, we handle the output with an AI to correct grammar mistakes and issues. If that feature were integrated into Google Cloud Speech-to-Text, it would remove the need for a third-party process. We could simply call Google Cloud Speech-to-Text without having to format or correct the output separately.
The tool's telephony model does not produce accurate results. With the telephony model, whenever a phone call occurs, we want to transcribe it live or something, and this is not possible in Module Google since it does not support Cloud Translation V2 API. If we are having, like, pre-recorded calls, it is giving you the transcript, but in some cases, where if the call is on hold for five to ten minutes or fifteen minutes or something, whatever we spoke about is not getting recorded, which is an issue.
Google Cloud Speech-to-Text's price could be improved. Google Cloud Speech-to-Text's trial experience could be improved by adding some extra minutes in the trial version.
The one thing that I find is when I often use specialized terms, and the solution doesn't know them. I'm not sure if there's an easy way out of this, so I usually have to delete the text to correct something. That would be good if I could just tell it to change the spelling or something like that, and it'd be smart enough to figure that out.
The multilanguage support for the chatbot needs to be better. If you incorporate the translation natively in a chatbot, maybe you can do a chatbot in English and automatically have the same chatbot, however, the same chatbot needs to be in all other languages as the translation services of Google is good. That translation is not bad for technical people. If they can get a multilanguage chatbot, it would be ideal.