Top Speech recognition repositories on GitHub
ASR models and pipelines for converting audio to text.
Ranked by stars across 641 repositories tagged speech-recognition. Refreshed daily.
- 1huggingface/transformers★ 160,327 · ⑂ 33,126
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
- nlp
- natural-language-processing
- pytorch
- pytorch-transformers
- transformer
- model-hub
- 2ggml-org/whisper.cpp★ 49,444 · ⑂ 5,506
Port of OpenAI's Whisper model in C/C++
- openai
- speech-to-text
- transformer
- whisper
- inference
- speech-recognition
- 3mozilla/DeepSpeech★ 26,753 · ⑂ 4,096
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
- deep-learning
- machine-learning
- neural-networks
- tensorflow
- speech-recognition
- speech-to-text
- 4SYSTRAN/faster-whisper★ 22,694 · ⑂ 1,854
Faster Whisper transcription with CTranslate2
- deep-learning
- inference
- quantization
- speech-recognition
- speech-to-text
- transformer
- 5m-bain/whisperX★ 21,736 · ⑂ 2,254
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
- asr
- speech
- speech-recognition
- speech-to-text
- whisper
- 6leon-ai/leon★ 17,216 · ⑂ 1,443
🧠 Leon is your open-source personal assistant.
- leon
- personal-assistant
- nodejs
- python
- ai
- artificial-intelligence
- 7modelscope/FunASR★ 15,972 · ⑂ 1,662
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
- conformer
- pytorch
- speech-recognition
- paraformer
- punctuation
- speaker-diarization
- 8kaldi-asr/kaldi★ 15,386 · ⑂ 5,359
kaldi-asr/kaldi is the official location of the Kaldi project.
- kaldi
- c-plus-plus
- cuda
- shell
- speech-recognition
- speech-to-text
- 9NVIDIA/DeepLearningExamples★ 14,805 · ⑂ 3,407
State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
- computer-vision
- deep-learning
- drug-discovery
- forecasting
- large-language-models
- mxnet
- 10alphacep/vosk-api★ 14,669 · ⑂ 1,712
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
- speech-recognition
- asr
- voice-recognition
- speech-to-text
- android
- ios
- 11kmario23/deep-learning-drizzle★ 12,807 · ⑂ 2,973
Drench yourself in Deep Learning, Reinforcement Learning, Machine Learning, Computer Vision, and NLP by learning from these exciting lectures!!
- machine-learning
- deep-learning
- deep-neural-networks
- pattern-recognition
- computer-vision
- optimization
- 12PaddlePaddle/PaddleSpeech★ 12,596 · ⑂ 1,956
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
- transformer
- conformer
- speech-translation
- streaming-asr
- speech-alignment
- punctuation-restoration
- 13speechbrain/speechbrain★ 11,517 · ⑂ 1,686
A PyTorch-based Speech Toolkit
- speech-recognition
- speech-toolkit
- speaker-recognition
- speech-to-text
- speech-enhancement
- speech-separation
- 14openvinotoolkit/openvino★ 10,202 · ⑂ 3,200
OpenVINO™ is an open source toolkit for optimizing and deploying AI inference
- inference
- deep-learning
- openvino
- ai
- computer-vision
- diffusion-models
- 15espnet/espnet★ 9,828 · ⑂ 2,399
End-to-End Speech Processing Toolkit
- deep-learning
- end-to-end
- chainer
- pytorch
- kaldi
- speech-recognition
- 16abus-aikorea/voice-pro★ 9,143 · ⑂ 1,226
Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with Whisper audio processing, YouTube download, Demucs vocal isolation, and multilingual translation.
- faster-whisper
- tts
- whisper
- gradio
- subtitles
- transcription
- 17Uberi/speech_recognition★ 8,964 · ⑂ 2,428
Speech recognition module for Python, supporting several engines and APIs, online and offline.
- python
- audio
- speech-recognition
- speech-to-text
- 18nl8590687/ASRT_SpeechRecognition★ 8,372 · ⑂ 1,899
A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统
- tensorflow
- cnn
- ctc
- python
- keras
- speech-recognition
- 19FunAudioLLM/SenseVoice★ 8,096 · ⑂ 739
Multilingual Voice Understanding Model
- ai
- asr
- gpt-4o
- speech-recognition
- speech-to-text
- aigc
- 20Blaizzy/mlx-audio★ 6,953 · ⑂ 579
A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX framework, providing efficient speech analysis on Apple Silicon.
- apple-silicon
- audio-processing
- mlx
- multimodal
- speech-recognition
- speech-synthesis
- 21TalAter/annyang★ 6,666 · ⑂ 1,036
💬 Speech recognition for your site
- speech-recognition
- speech
- speech-to-text
- voice
- 22flashlight/wav2letter★ 6,446 · ⑂ 992
Facebook AI Research's Automatic Speech Recognition Toolkit
- wav2letter
- speech-recognition
- end-to-end
- deep-learning
- cpp
- 23PaddlePaddle/PaddleX★ 6,128 · ⑂ 1,188
All-in-One Development Tool based on PaddlePaddle
- classification
- segmentation
- deployment
- ocr
- time-series
- pp-chatocr
- 24argmaxinc/argmax-oss-swift★ 6,067 · ⑂ 552
On-device Speech AI for Apple Silicon
- inference
- ios
- speech-recognition
- swift
- whisper
- transformers
- 25modelscope/FunClip★ 5,578 · ⑂ 688
Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.
- speech-recognition
- video-clip
- video-subtitles
- subtitles-generator
- speech-to-text
- gradio
Find engineers shipping Speech recognition
The list above ranks the most-starred public repositories tagged with the Speech recognition topic, drawn from the public GitHub graph. Across 641 repositories tagged this way, the maintainers and top contributors are a tight cluster of the people actually building Speech recognition.
Looking for engineers who’ve worked on Speech recognition for real, not just listed it on LinkedIn? The fastest path is the contributor list of these repos. Their commits, issues, and READMEs are public proof of depth.
Refolk turns this list into a search. Ask for “maintainers of top Speech recognition repos who are hiring”, “Speech recognition engineers in San Francisco”, or “founders shipping Speech recognition” and Refolk returns a ranked shortlist with sources.
How this list is built
Last refreshed: Thu, 07 May 2026 05:55:18 GMT
Need a list like this for any search?
Refolk runs natural-language searches across GitHub, LinkedIn, and the open web. Try one of these:
Browse other topics
- Top Design systems repos
- Top Terminal UI repos
- Top Computer vision repos
- Top REST APIs repos
- Top Fine-tuning repos
- Top AI agents repos
- Top Game development repos
- Top Text-to-speech repos
See all repository lists.