Live transcription could be improved. Sometimes, Deepgram's WebSocket is disposed of due to redundancy issues. Enhanced stability in live transcription would be beneficial.
Two things come to mind for improvement. Maybe they have fixed these, or maybe there is something new, and we haven't implemented it yet. One improvement could be dual-channel audio. We've had issues in the past where it generates the transcript, and a lot of the text is duplicated. I understand why it would happen. It's an audio file with more than one channel of the same speaker, which is what may cause the duplicated text. That said, it would be great either to have a way for Deepgram to realize that it's basically the same audio on two channels and only transcribe one of them or at least give us a warning that it's happening. We've found workarounds, however, a better solution from Deepgram's side would be great. The other issue comes up when some changes are made on their end, and we want to test them. We've had one to two instances where they tell us that we have access, and we try to test something out, and it turns out we don't. When that happens, then they have to fix something on their end. It's not a big deal. We have a Slack channel with them where we can quickly touch base. We let them know, and they will get back to us and fix the access. It's not something we're doing very often.
I need to transcribe my videos to text chat, but there are some issues when I run Deepgram. The solution does not properly identify the number of speakers. For example, Deepgram only identifies two speakers out of three or four speakers in some videos. The solution also makes some spelling and English grammar mistakes. Deepgram does not properly identify some specific words in a sentence.
Deepgram's voice AI platform provides APIs for speech-to-text, text-to-speech, and language understanding. From medical transcription to autonomous agents, Deepgram is the go-to choice for developers of voice AI experiences.
Live transcription could be improved. Sometimes, Deepgram's WebSocket is disposed of due to redundancy issues. Enhanced stability in live transcription would be beneficial.
Deepgram is currently restricted to only the English variants, but it should include other languages, such as German or French.
Two things come to mind for improvement. Maybe they have fixed these, or maybe there is something new, and we haven't implemented it yet. One improvement could be dual-channel audio. We've had issues in the past where it generates the transcript, and a lot of the text is duplicated. I understand why it would happen. It's an audio file with more than one channel of the same speaker, which is what may cause the duplicated text. That said, it would be great either to have a way for Deepgram to realize that it's basically the same audio on two channels and only transcribe one of them or at least give us a warning that it's happening. We've found workarounds, however, a better solution from Deepgram's side would be great. The other issue comes up when some changes are made on their end, and we want to test them. We've had one to two instances where they tell us that we have access, and we try to test something out, and it turns out we don't. When that happens, then they have to fix something on their end. It's not a big deal. We have a Slack channel with them where we can quickly touch base. We let them know, and they will get back to us and fix the access. It's not something we're doing very often.
I need to transcribe my videos to text chat, but there are some issues when I run Deepgram. The solution does not properly identify the number of speakers. For example, Deepgram only identifies two speakers out of three or four speakers in some videos. The solution also makes some spelling and English grammar mistakes. Deepgram does not properly identify some specific words in a sentence.