Top Python Natural language processing repositories on GitHub
Tokenizers, classical NLP, and modern language model tooling. Filtered to projects whose primary language is Python.
Ranked by stars across 2,298 Python repositories tagged nlp. Refreshed daily.
- 1huggingface/transformers★ 161,762 · ⑂ 33,563
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
- nlp
- natural-language-processing
- pytorch
- pytorch-transformers
- transformer
- model-hub
- 2hiyouga/LlamaFactory★ 72,315 · ⑂ 8,849
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
- fine-tuning
- llama
- llm
- peft
- transformers
- rlhf
- 3apachecn/ailearning★ 42,338 · ⑂ 11,543
AiLearning:数据分析+机器学习实战+线性代数+PyTorch+NLTK+TF2
- fp-growth
- apriori
- mahchine-leaning
- naivebayes
- svm
- adaboost
- 4666ghj/BettaFish★ 41,449 · ⑂ 7,594
微舆:人人可用的多Agent舆情分析助手,打破信息茧房,还原舆情原貌,预测未来走向,辅助决策!从0实现,不依赖任何框架。
- agent-framework
- data-analysis
- multi-agent-system
- nlp
- public-opinion-analysis
- python3
- 5google-research/bert★ 40,029 · ⑂ 9,699
TensorFlow code and pre-trained models for BERT
- nlp
- natural-language-processing
- natural-language-understanding
- tensorflow
- 6google/langextract★ 36,930 · ⑂ 2,548
A Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive visualization.
- llm
- nlp
- python
- gemini-ai
- information-extration
- large-language-models
- 7hankcs/HanLP★ 36,426 · ⑂ 10,931
中文分词 词性标注 命名实体识别 依存句法分析 成分句法分析 语义依存分析 语义角色标注 指代消解 风格转换 语义相似度 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理
- nlp
- natural-language-processing
- hanlp
- pos-tagging
- dependency-parser
- text-classification
- 8rohitg00/ai-engineering-from-scratch★ 35,149 · ⑂ 5,734
Learn it. Build it. Ship it for others.
- agents
- ai
- ai-agents
- ai-engineering
- computer-vision
- course
- 9explosion/spaCy★ 33,674 · ⑂ 4,688
💫 Industrial-strength Natural Language Processing (NLP) in Python
- natural-language-processing
- data-science
- machine-learning
- python
- cython
- nlp
- 10stanford-oval/storm★ 28,978 · ⑂ 2,674
An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.
- large-language-models
- nlp
- knowledge-curation
- naacl
- report-generation
- retrieval-augmented-generation
- 11microsoft/unilm★ 22,147 · ⑂ 2,697
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
- nlp
- pre-trained-model
- unilm
- minilm
- layoutlm
- layoutxlm
- 12huggingface/datasets★ 21,641 · ⑂ 3,258
🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools
- nlp
- datasets
- pytorch
- tensorflow
- pandas
- numpy
- 13RasaHQ/rasa★ 21,217 · ⑂ 4,917
💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants
- nlp
- machine-learning
- machine-learning-library
- bot
- bots
- botkit
- 14ymcui/Chinese-LLaMA-Alpaca★ 18,942 · ⑂ 1,851
中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)
- llm
- plm
- pre-trained-language-models
- alpaca
- llama
- nlp
- 15piskvorky/gensim★ 16,443 · ⑂ 4,408
Topic Modelling for Humans
- gensim
- topic-modeling
- information-retrieval
- machine-learning
- natural-language-processing
- nlp
- 16
- 17flairNLP/flair★ 14,381 · ⑂ 2,109
A very simple framework for state-of-the-art Natural Language Processing (NLP)
- pytorch
- nlp
- named-entity-recognition
- sequence-labeling
- semantic-role-labeling
- word-embeddings
- 18
- 19PaddlePaddle/PaddleNLP★ 12,952 · ⑂ 3,036
Easy-to-use and powerful LLM and SLM library with awesome model zoo.
- nlp
- embedding
- bert
- ernie
- paddlenlp
- pretrained-models
- 20neuml/txtai★ 12,673 · ⑂ 835
💡 All-in-one AI framework for semantic search, LLM orchestration and language model workflows
- python
- search
- nlp
- semantic-search
- vector-search
- txtai
- 21allenai/allennlp★ 11,889 · ⑂ 2,214
An open-source NLP research library, built on PyTorch.
- pytorch
- nlp
- natural-language-processing
- deep-learning
- data-science
- python
- 22huggingface/text-generation-inference★ 10,863 · ⑂ 1,271
Large Language Model Text Generation Inference
- bloom
- nlp
- pytorch
- inference
- gpt
- deep-learning
- 23chiphuyen/stanford-tensorflow-tutorials★ 10,371 · ⑂ 4,254
This repository contains code examples for the Stanford's course: TensorFlow for Deep Learning Research.
- tensorflow
- deep-learning
- tutorial
- nlp
- natural-language-processing
- chatbot
- 24bigscience-workshop/petals★ 10,229 · ⑂ 618
🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
- bloom
- deep-learning
- distributed-systems
- language-models
- large-language-models
- machine-learning
- 25ymcui/Chinese-BERT-wwm★ 10,217 · ⑂ 1,387
Pre-Training with Whole Word Masking for Chinese BERT(中文BERT-wwm系列模型)
- chinese-bert
- tensorflow
- pytorch
- bert
- nlp
- roberta
Find Python engineers shipping Natural language processing
The list above ranks the most-starred public Python repositories tagged with the Natural language processing topic, drawn from the public GitHub graph. Across 2,298 matching repositories, the contributors are a tight cluster of engineers with both Python chops and real Natural language processing experience.
That overlap is rare. Most Python engineers haven’t shipped Natural language processing, and most Natural language processing maintainers don’t write Python. The people on this list’s contributor graph are the ones who do both.
Refolk turns this list into a search. Ask for “Python Natural language processing maintainers hiring” or “Python engineers shipping Natural language processing in 2025” and Refolk returns a ranked shortlist with the commits, profiles, and projects behind each name.
How this list is built
Last refreshed: Sun, 21 Jun 2026 07:07:33 GMT
Need a more specific search?
Refolk runs natural-language searches across GitHub, LinkedIn, and the open web. Try one of these:
Related lists
- Python · Machine learning
- Python · Deep learning
- Python · Computer vision
- Python · LLM
- Python · AI agents
- Python · RAG
- Python · Embeddings
- Python · Transformers
See all repository lists.