Refolk

Top Python Natural language processing repositories on GitHub

Tokenizers, classical NLP, and modern language model tooling. Filtered to projects whose primary language is Python.

Ranked by stars across 2,276 Python repositories tagged nlp. Refreshed daily.

  1. 1
    huggingface/transformers160,327 · ⑂ 33,126

    🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

    • nlp
    • natural-language-processing
    • pytorch
    • pytorch-transformers
    • transformer
    • model-hub
  2. 2
    hiyouga/LlamaFactory70,990 · ⑂ 8,673

    Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

    • fine-tuning
    • llama
    • llm
    • peft
    • transformers
    • rlhf
  3. 3
    apachecn/ailearning42,235 · ⑂ 11,571

    AiLearning:数据分析+机器学习实战+线性代数+PyTorch+NLTK+TF2

    • fp-growth
    • apriori
    • mahchine-leaning
    • naivebayes
    • svm
    • adaboost
  4. 4
    666ghj/BettaFish40,776 · ⑂ 7,537

    微舆:人人可用的多Agent舆情分析助手,打破信息茧房,还原舆情原貌,预测未来走向,辅助决策!从0实现,不依赖任何框架。

    • agent-framework
    • data-analysis
    • multi-agent-system
    • nlp
    • public-opinion-analysis
    • python3
  5. 5
    google-research/bert40,001 · ⑂ 9,718

    TensorFlow code and pre-trained models for BERT

    • nlp
    • google
    • natural-language-processing
    • natural-language-understanding
    • tensorflow
  6. 6
    google/langextract36,394 · ⑂ 2,503

    A Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive visualization.

    • llm
    • nlp
    • python
    • gemini-ai
    • information-extration
    • large-language-models
  7. 7
    hankcs/HanLP36,299 · ⑂ 10,903

    中文分词 词性标注 命名实体识别 依存句法分析 成分句法分析 语义依存分析 语义角色标注 指代消解 风格转换 语义相似度 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理

    • nlp
    • natural-language-processing
    • hanlp
    • pos-tagging
    • dependency-parser
    • text-classification
  8. 8
    explosion/spaCy33,546 · ⑂ 4,679

    💫 Industrial-strength Natural Language Processing (NLP) in Python

    • natural-language-processing
    • data-science
    • machine-learning
    • python
    • cython
    • nlp
  9. 9
    stanford-oval/storm28,162 · ⑂ 2,566

    An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.

    • large-language-models
    • nlp
    • knowledge-curation
    • naacl
    • report-generation
    • retrieval-augmented-generation
  10. 10
    microsoft/unilm22,116 · ⑂ 2,698

    Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

    • nlp
    • pre-trained-model
    • unilm
    • minilm
    • layoutlm
    • layoutxlm
  11. 11
    huggingface/datasets21,492 · ⑂ 3,196

    🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools

    • nlp
    • datasets
    • pytorch
    • tensorflow
    • pandas
    • numpy
  12. 12
    RasaHQ/rasa21,153 · ⑂ 4,910

    💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

    • nlp
    • machine-learning
    • machine-learning-library
    • bot
    • bots
    • botkit
  13. 13
    ymcui/Chinese-LLaMA-Alpaca18,945 · ⑂ 1,855

    中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)

    • llm
    • plm
    • pre-trained-language-models
    • alpaca
    • llama
    • nlp
  14. 14
    piskvorky/gensim16,408 · ⑂ 4,412

    Topic Modelling for Humans

    • gensim
    • topic-modeling
    • information-retrieval
    • machine-learning
    • natural-language-processing
    • nlp
  15. 15
    nltk/nltk14,603 · ⑂ 3,002

    NLTK Source

    • nltk
    • python
    • nlp
    • natural-language-processing
    • machine-learning
  16. 16
    flairNLP/flair14,374 · ⑂ 2,115

    A very simple framework for state-of-the-art Natural Language Processing (NLP)

    • pytorch
    • nlp
    • named-entity-recognition
    • sequence-labeling
    • semantic-role-labeling
    • word-embeddings
  17. 17
    GeeeekExplorer/nano-vllm13,271 · ⑂ 2,046

    Nano vLLM

    • inference
    • llm
    • pytorch
    • transformer
    • deep-learning
    • nlp
  18. 18
    PaddlePaddle/PaddleNLP12,937 · ⑂ 3,044

    Easy-to-use and powerful LLM and SLM library with awesome model zoo.

    • nlp
    • embedding
    • bert
    • ernie
    • paddlenlp
    • pretrained-models
  19. 19
    neuml/txtai12,471 · ⑂ 808

    💡 All-in-one AI framework for semantic search, LLM orchestration and language model workflows

    • python
    • search
    • nlp
    • semantic-search
    • vector-search
    • txtai
  20. 20
    allenai/allennlp11,893 · ⑂ 2,223

    An open-source NLP research library, built on PyTorch.

    • pytorch
    • nlp
    • natural-language-processing
    • deep-learning
    • data-science
    • python
  21. 21

    Large Language Model Text Generation Inference

    • bloom
    • nlp
    • pytorch
    • inference
    • gpt
    • deep-learning
  22. 22

    This repository contains code examples for the Stanford's course: TensorFlow for Deep Learning Research.

    • tensorflow
    • deep-learning
    • tutorial
    • nlp
    • natural-language-processing
    • chatbot
  23. 23
    ymcui/Chinese-BERT-wwm10,204 · ⑂ 1,390

    Pre-Training with Whole Word Masking for Chinese BERT(中文BERT-wwm系列模型)

    • chinese-bert
    • tensorflow
    • pytorch
    • bert
    • nlp
    • roberta
  24. 24
    bigscience-workshop/petals10,122 · ⑂ 607

    🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading

    • bloom
    • deep-learning
    • distributed-systems
    • language-models
    • large-language-models
    • machine-learning
  25. 25

    A PyTorch implementation of the Transformer model in "Attention is All You Need".

    • attention
    • deep-learning
    • attention-is-all-you-need
    • pytorch
    • nlp
    • natural-language-processing

Find Python engineers shipping Natural language processing

The list above ranks the most-starred public Python repositories tagged with the Natural language processing topic, drawn from the public GitHub graph. Across 2,276 matching repositories, the contributors are a tight cluster of engineers with both Python chops and real Natural language processing experience.

That overlap is rare. Most Python engineers haven’t shipped Natural language processing, and most Natural language processing maintainers don’t write Python. The people on this list’s contributor graph are the ones who do both.

Refolk turns this list into a search. Ask for Python Natural language processing maintainers hiring” or Python engineers shipping Natural language processing in 2025” and Refolk returns a ranked shortlist with the commits, profiles, and projects behind each name.

How this list is built

Refolk searched GitHub for public Python repositories tagged with the Natural language processing topic, ranked them by stargazer count, and kept those with at least 25 stars. The list refreshes once a day.

Last refreshed: Thu, 07 May 2026 05:54:07 GMT

Need a more specific search?

Refolk runs natural-language searches across GitHub, LinkedIn, and the open web. Try one of these:

Related lists

See all repository lists.

Or zoom out