Speech-to-Text

Speech-to-Text (STT), also called Automatic Speech Recognition (ASR), refers to speech transcription, that is, turning audio into text. The Wav2Vec 2.0 model is state-of-the-art for speech transcription. It is a model that creates speech representations, whose training is carried out in a self-supervised manner.