Refolk

Top Python Speech recognition repositories on GitHub

ASR models and pipelines for converting audio to text. Filtered to projects whose primary language is Python.

Ranked by stars across 444 Python repositories tagged speech-recognition. Refreshed daily.

  1. 1
    huggingface/transformers160,334 · ⑂ 33,126

    🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

    • nlp
    • natural-language-processing
    • pytorch
    • pytorch-transformers
    • transformer
    • model-hub
  2. 2
    SYSTRAN/faster-whisper22,697 · ⑂ 1,854

    Faster Whisper transcription with CTranslate2

    • deep-learning
    • inference
    • quantization
    • speech-recognition
    • speech-to-text
    • transformer
  3. 3
    m-bain/whisperX21,737 · ⑂ 2,254

    WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

    • asr
    • speech
    • speech-recognition
    • speech-to-text
    • whisper
  4. 4
    modelscope/FunASR15,973 · ⑂ 1,662

    A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

    • conformer
    • pytorch
    • speech-recognition
    • paraformer
    • punctuation
    • speaker-diarization
  5. 5
    PaddlePaddle/PaddleSpeech12,596 · ⑂ 1,956

    Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

    • transformer
    • conformer
    • speech-translation
    • streaming-asr
    • speech-alignment
    • punctuation-restoration
  6. 6
    speechbrain/speechbrain11,517 · ⑂ 1,686

    A PyTorch-based Speech Toolkit

    • speech-recognition
    • speech-toolkit
    • speaker-recognition
    • speech-to-text
    • speech-enhancement
    • speech-separation
  7. 7
    espnet/espnet9,828 · ⑂ 2,399

    End-to-End Speech Processing Toolkit

    • deep-learning
    • end-to-end
    • chainer
    • pytorch
    • kaldi
    • speech-recognition
  8. 8
    abus-aikorea/voice-pro9,150 · ⑂ 1,230

    Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with Whisper audio processing, YouTube download, Demucs vocal isolation, and multilingual translation.

    • faster-whisper
    • tts
    • whisper
    • gradio
    • subtitles
    • transcription
  9. 9
    Uberi/speech_recognition8,964 · ⑂ 2,428

    Speech recognition module for Python, supporting several engines and APIs, online and offline.

    • python
    • audio
    • speech-recognition
    • speech-to-text
  10. 10
    nl8590687/ASRT_SpeechRecognition8,372 · ⑂ 1,899

    A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统

    • tensorflow
    • cnn
    • ctc
    • python
    • keras
    • speech-recognition
  11. 11
    FunAudioLLM/SenseVoice8,096 · ⑂ 739

    Multilingual Voice Understanding Model

    • ai
    • asr
    • gpt-4o
    • speech-recognition
    • speech-to-text
    • aigc
  12. 12
    Blaizzy/mlx-audio6,953 · ⑂ 579

    A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX framework, providing efficient speech analysis on Apple Silicon.

    • apple-silicon
    • audio-processing
    • mlx
    • multimodal
    • speech-recognition
    • speech-synthesis
  13. 13
    PaddlePaddle/PaddleX6,128 · ⑂ 1,188

    All-in-One Development Tool based on PaddlePaddle

    • classification
    • segmentation
    • deployment
    • ocr
    • time-series
    • pp-chatocr
  14. 14
    modelscope/FunClip5,578 · ⑂ 688

    Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.

    • speech-recognition
    • video-clip
    • video-subtitles
    • subtitles-generator
    • speech-to-text
    • gradio
  15. 15
    wenet-e2e/wenet5,103 · ⑂ 1,182

    Production First and Production Ready End-to-End Speech Recognition Toolkit

    • e2e-models
    • pytorch
    • asr
    • transformer
    • conformer
    • production-ready
  16. 16
    Picovoice/porcupine4,808 · ⑂ 574

    On-device wake word detection powered by deep learning

    • wake-word-detection
    • hotword
    • keyword-spotting
    • keyword-spotter
    • wake-word
    • wake-word-engine
  17. 17
    yanshengjia/ml-road4,754 · ⑂ 1,701

    Machine Learning and Agentic AI Resources, Practice and Research

    • machine-learning
    • deep-learning
    • nlp
    • computer-vision
    • speech-recognition
    • tensorflow
  18. 18
    jianchang512/stt4,512 · ⑂ 483

    Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具,输出json、srt字幕、纯文字格式

    • speech
    • speech-recognition
    • speech-to-text
    • stt
  19. 19
    huggingface/distil-whisper4,081 · ⑂ 355

    Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.

    • audio
    • speech-recognition
    • whisper
  20. 20

    OpenAI Whisper ASR Webservice API

    • automatic-speech-recognition
    • speech-recognition
    • speech-to-text
    • openai-whisper
    • docker
    • asr
  21. 21
    chenyme/Chenyme-AAVT3,043 · ⑂ 242

    这是一个全自动(音频)视频翻译项目。利用Whisper识别声音,AI大模型翻译字幕,最后合并字幕视频,生成翻译后的视频。

    • faster-whisper
    • gpt-4
    • speech-recognition
    • video-translation
    • whisper
    • gpt-4o
  22. 22
    tensorflow/lingvo2,863 · ⑂ 450

    Lingvo

    • speech-recognition
    • translation
    • speech-to-text
    • machine-translation
    • mnist
    • seq2seq
  23. 23

    End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow

    • automatic-speech-recognition
    • tensorflow
    • timit-dataset
    • feature-vector
    • phonemes
    • data-preprocessing
  24. 24
    linto-ai/whisper-timestamped2,812 · ⑂ 211

    Multilingual Automatic Speech Recognition with word-level timestamps and confidence

    • deep-learning
    • speech
    • speech-recognition
    • speech-to-text
    • asr
    • machine-learning
  25. 25
    mravanelli/pytorch-kaldi2,398 · ⑂ 444

    pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.

    • speech-recognition
    • gru
    • dnn
    • kaldi
    • rnn-model
    • pytorch

Find Python engineers shipping Speech recognition

The list above ranks the most-starred public Python repositories tagged with the Speech recognition topic, drawn from the public GitHub graph. Across 444 matching repositories, the contributors are a tight cluster of engineers with both Python chops and real Speech recognition experience.

That overlap is rare. Most Python engineers haven’t shipped Speech recognition, and most Speech recognition maintainers don’t write Python. The people on this list’s contributor graph are the ones who do both.

Refolk turns this list into a search. Ask for Python Speech recognition maintainers hiring” or Python engineers shipping Speech recognition in 2025” and Refolk returns a ranked shortlist with the commits, profiles, and projects behind each name.

How this list is built

Refolk searched GitHub for public Python repositories tagged with the Speech recognition topic, ranked them by stargazer count, and kept those with at least 25 stars. The list refreshes once a day.

Last refreshed: Thu, 07 May 2026 06:52:13 GMT

Need a more specific search?

Refolk runs natural-language searches across GitHub, LinkedIn, and the open web. Try one of these:

Related lists

See all repository lists.

Or zoom out