Refolk

Top Speech recognition repositories on GitHub

ASR models and pipelines for converting audio to text.

Ranked by stars across 657 repositories tagged speech-recognition. Refreshed daily.

  1. 1
    huggingface/transformers161,764 · ⑂ 33,564

    🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

    • nlp
    • natural-language-processing
    • pytorch
    • pytorch-transformers
    • transformer
    • model-hub
  2. 2
    ggml-org/whisper.cpp50,912 · ⑂ 5,683

    Port of OpenAI's Whisper model in C/C++

    • openai
    • speech-to-text
    • transformer
    • whisper
    • inference
    • speech-recognition
  3. 3
    mozilla/DeepSpeech26,756 · ⑂ 4,086

    DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

    • deep-learning
    • machine-learning
    • neural-networks
    • tensorflow
    • speech-recognition
    • speech-to-text
  4. 4
    SYSTRAN/faster-whisper23,760 · ⑂ 1,949

    Faster Whisper transcription with CTranslate2

    • deep-learning
    • inference
    • quantization
    • speech-recognition
    • speech-to-text
    • transformer
  5. 5
    m-bain/whisperX22,584 · ⑂ 2,310

    WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

    • asr
    • speech
    • speech-recognition
    • speech-to-text
    • whisper
  6. 6
    modelscope/FunASR18,389 · ⑂ 1,870

    Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.

    • pytorch
    • speech-recognition
    • paraformer
    • punctuation
    • speaker-diarization
    • voice-activity-detection
  7. 7
    leon-ai/leon17,334 · ⑂ 1,446

    🧠 Leon is your open-source personal assistant.

    • leon
    • personal-assistant
    • nodejs
    • python
    • ai
    • artificial-intelligence
  8. 8
    kaldi-asr/kaldi15,418 · ⑂ 5,357

    kaldi-asr/kaldi is the official location of the Kaldi project.

    • kaldi
    • c-plus-plus
    • cuda
    • shell
    • speech-recognition
    • speech-to-text
  9. 9
    alphacep/vosk-api14,868 · ⑂ 1,734

    Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node

    • speech-recognition
    • asr
    • voice-recognition
    • speech-to-text
    • android
    • ios
  10. 10
    NVIDIA/DeepLearningExamples14,821 · ⑂ 3,407

    State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.

    • computer-vision
    • deep-learning
    • drug-discovery
    • forecasting
    • large-language-models
    • mxnet
  11. 11
    kmario23/deep-learning-drizzle12,818 · ⑂ 2,973

    Drench yourself in Deep Learning, Reinforcement Learning, Machine Learning, Computer Vision, and NLP by learning from these exciting lectures!!

    • machine-learning
    • deep-learning
    • deep-neural-networks
    • pattern-recognition
    • computer-vision
    • optimization
  12. 12
    PaddlePaddle/PaddleSpeech12,622 · ⑂ 1,958

    Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

    • transformer
    • conformer
    • speech-translation
    • streaming-asr
    • speech-alignment
    • punctuation-restoration
  13. 13
    speechbrain/speechbrain11,641 · ⑂ 1,702

    A PyTorch-based Speech Toolkit

    • speech-recognition
    • speech-toolkit
    • speaker-recognition
    • speech-to-text
    • speech-enhancement
    • speech-separation
  14. 14
    abus-aikorea/voice-pro11,020 · ⑂ 1,604

    Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with Whisper audio processing, YouTube download, Demucs vocal isolation, and multilingual translation.

    • faster-whisper
    • tts
    • whisper
    • gradio
    • subtitles
    • transcription
  15. 15
    openvinotoolkit/openvino10,399 · ⑂ 3,249

    OpenVINO™ is an open source toolkit for optimizing and deploying AI inference

    • inference
    • deep-learning
    • openvino
    • ai
    • computer-vision
    • diffusion-models
  16. 16
    espnet/espnet9,867 · ⑂ 2,412

    End-to-End Speech Processing Toolkit

    • deep-learning
    • end-to-end
    • chainer
    • pytorch
    • kaldi
    • speech-recognition
  17. 17
    Uberi/speech_recognition8,969 · ⑂ 2,420

    Speech recognition module for Python, supporting several engines and APIs, online and offline.

    • python
    • audio
    • speech-recognition
    • speech-to-text
  18. 18
    FunAudioLLM/SenseVoice8,625 · ⑂ 784

    Multilingual speech understanding: ASR + emotion recognition + audio event detection. 50+ languages, 15x faster than Whisper, non-autoregressive.

    • asr
    • speech-recognition
    • speech-to-text
    • cross-lingual
    • llm
    • python
  19. 19
    nl8590687/ASRT_SpeechRecognition8,376 · ⑂ 1,898

    A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统

    • tensorflow
    • cnn
    • ctc
    • python
    • keras
    • speech-recognition
  20. 20
    Blaizzy/mlx-audio7,402 · ⑂ 643

    A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX framework, providing efficient speech analysis on Apple Silicon.

    • apple-silicon
    • audio-processing
    • mlx
    • multimodal
    • speech-recognition
    • speech-synthesis
  21. 21
    debpalash/OmniVoice-Studio7,356 · ⑂ 1,129

    The open-source ElevenLabs alternative for local voice cloning, design, create, dubbing and dictation Desktop App

    • tts
    • voice-cloning
    • voice-generation
    • voice-ai
    • asr
    • elevenlabs
  22. 22
    TalAter/annyang6,814 · ⑂ 1,048

    💬 Speech recognition for your site

    • speech-recognition
    • speech
    • speech-to-text
    • voice
  23. 23
    flashlight/wav2letter6,444 · ⑂ 992

    Facebook AI Research's Automatic Speech Recognition Toolkit

    • wav2letter
    • speech-recognition
    • end-to-end
    • deep-learning
    • cpp
  24. 24
    argmaxinc/argmax-oss-swift6,223 · ⑂ 573

    On-device Speech AI for Apple Silicon

    • inference
    • ios
    • speech-recognition
    • swift
    • whisper
    • transformers
  25. 25
    PaddlePaddle/PaddleX6,158 · ⑂ 1,198

    All-in-One Development Tool based on PaddlePaddle

    • classification
    • segmentation
    • deployment
    • ocr
    • time-series
    • pp-chatocr

Find engineers shipping Speech recognition

The list above ranks the most-starred public repositories tagged with the Speech recognition topic, drawn from the public GitHub graph. Across 657 repositories tagged this way, the maintainers and top contributors are a tight cluster of the people actually building Speech recognition.

Looking for engineers who’ve worked on Speech recognition for real, not just listed it on LinkedIn? The fastest path is the contributor list of these repos. Their commits, issues, and READMEs are public proof of depth.

Refolk turns this list into a search. Ask for “maintainers of top Speech recognition repos who are hiring”, Speech recognition engineers in San Francisco”, or “founders shipping Speech recognition” and Refolk returns a ranked shortlist with sources.

How this list is built

Refolk searched GitHub for public repositories tagged with the Speech recognition topic, ranked them by stargazer count, and kept those with at least 50 stars. The list refreshes once a day.

Last refreshed: Sun, 21 Jun 2026 08:17:44 GMT

Need a list like this for any search?

Refolk runs natural-language searches across GitHub, LinkedIn, and the open web. Try one of these:

Browse other topics

See all repository lists.

Speech recognition by language