Top Python RAG repositories on GitHub
Retrieval-augmented generation pipelines, embeddings, and grounding tooling. Filtered to projects whose primary language is Python.
Ranked by stars across 866 Python repositories tagged rag. Refreshed daily.
- 1open-webui/open-webui★ 142,457 · ⑂ 20,485
User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
- ollama
- ollama-webui
- llm
- webui
- self-hosted
- llm-ui
- 2langchain-ai/langchain★ 139,781 · ⑂ 23,183
The agent engineering platform.
- ai
- anthropic
- gemini
- langchain
- llm
- openai
- 3Shubhamsaboo/awesome-llm-apps★ 115,182 · ⑂ 17,106
100+ AI Agent & RAG apps you can actually run — clone, customize, ship.
- llms
- rag
- python
- agents
- 4infiniflow/ragflow★ 83,263 · ⑂ 9,637
RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs
- ai
- ai-agents
- context-engine
- llm-apps
- rag
- retrieval-augmented-generation
- 5PaddlePaddle/PaddleOCR★ 83,156 · ⑂ 10,829
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
- ocr
- chineseocr
- pdf2markdown
- pp-ocr
- pp-structure
- document-parsing
- 6safishamsi/graphify★ 69,989 · ⑂ 7,029
AI coding assistant skill (Claude Code, Codex, OpenCode, Cursor, Gemini CLI, and more). Turn any folder of code, SQL schemas, R scripts, shell scripts, docs, papers, images, or videos into a queryable knowledge graph. App code + database schema + infrastructure in one graph.
- claude-code
- graphrag
- knowledge-graph
- codex
- openclaw
- skills
- 7
- 8
- 9run-llama/llama_index★ 50,246 · ⑂ 7,597
LlamaIndex is the leading document agent and OCR platform
- agents
- application
- data
- fine-tuning
- framework
- llamaindex
- 10chopratejas/headroom★ 42,368 · ⑂ 2,923
Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server.
- agent
- ai
- anthropic
- compression
- context-engineering
- context-window
- 11QuivrHQ/quivr★ 39,163 · ⑂ 3,723
Opiniated RAG for integrating GenAI in your apps 🧠 Focus on your product rather than the RAG. Easy integration in existing products with customisation! Any LLM: GPT4, Groq, Llama. Any Vectorstore: PGVector, Faiss. Any Files. Anyway you want.
- ai
- llm
- api
- chatbot
- chatgpt
- database
- 12chatchat-space/Langchain-Chatchat★ 38,200 · ⑂ 6,215
Langchain-Chatchat(原Langchain-ChatGLM)基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like ChatGLM, Qwen and Llama) RAG and Agent app with langchain
- chatglm
- langchain
- llm
- knowledge-base
- llama
- chatbot
- 13HKUDS/LightRAG★ 36,816 · ⑂ 5,192
[EMNLP2025] "LightRAG: Simple and Fast Retrieval-Augmented Generation"
- knowledge-graph
- large-language-models
- retrieval-augmented-generation
- genai
- graphrag
- llm
- 14langchain-ai/langgraph★ 35,317 · ⑂ 5,924
Build resilient agents.
- agents
- ai
- ai-agents
- chatgpt
- deepagents
- enterprise
- 15khoj-ai/khoj★ 35,227 · ⑂ 2,256
Your AI second brain. Self-hostable. Get answers from the web or your docs. Build custom agents, schedule automations, do deep research. Turn any online or local LLM into your personal, autonomous AI (gpt, claude, gemini, llama, qwen, mistral). Get started - free.
- semantic-search
- emacs
- obsidian-md
- chat
- chatgpt
- ai
- 16microsoft/graphrag★ 33,882 · ⑂ 3,592
A modular graph-based Retrieval-Augmented Generation (RAG) system
- graphrag
- rag
- llm
- llms
- gpt
- gpt-4
- 17VectifyAI/PageIndex★ 33,257 · ⑂ 2,894
📑 PageIndex: Document Index for Vectorless, Reasoning-based RAG
- agentic-ai
- agents
- ai
- ai-agents
- context-engineering
- llm
- 18onyx-dot-app/onyx★ 30,458 · ⑂ 4,175
Open Source AI Platform - AI Chat with advanced features that works with every LLM
- enterprise-search
- rag
- ai-chat
- chatgpt
- gen-ai
- nextjs
- 19
- 20ScrapeGraphAI/Scrapegraph-ai★ 27,379 · ⑂ 2,588
Python scraper based on AI
- scraping
- scraping-python
- llm
- web-crawler
- web-scraping
- ai-scraping
- 21volcengine/OpenViking★ 25,856 · ⑂ 2,001
OpenViking is an open-source context database designed specifically for AI Agents(such as openclaw). OpenViking unifies the management of context (memory, resources, and skills) that Agents need through a file system paradigm, enabling hierarchical context delivery and self-evolving.
- context-engineering
- filesystem
- rag
- memory
- skill
- agent
- 22Cinnamon/kotaemon★ 25,477 · ⑂ 2,123
An open-source RAG-based tool for chatting with your documents.
- chatbot
- llms
- open-source
- rag
- 23HKUDS/DeepTutor★ 24,859 · ⑂ 3,360
DeepTutor: Agent-native Personalized Tutoring. https://deeptutor.info/.
- ai-tutor
- deepresearch
- interactive-learning
- large-language-models
- multi-agent-systems
- rag
- 24vanna-ai/vanna★ 23,653 · ⑂ 2,429
🤖 Chat with your SQL database 📊. Accurate Text-to-SQL Generation via LLMs using Agentic Retrieval 🔄.
- agent
- ai
- data-visualization
- database
- llm
- sql
- 251Panel-dev/MaxKB★ 21,371 · ⑂ 2,903
🔥 MaxKB is an open-source platform for building enterprise-grade agents. 强大易用的开源企业级智能体平台。
- llm
- ollama
- maxkb
- knowledgebase
- chatbot
- langchain
Find Python engineers shipping RAG
The list above ranks the most-starred public Python repositories tagged with the RAG topic, drawn from the public GitHub graph. Across 866 matching repositories, the contributors are a tight cluster of engineers with both Python chops and real RAG experience.
That overlap is rare. Most Python engineers haven’t shipped RAG, and most RAG maintainers don’t write Python. The people on this list’s contributor graph are the ones who do both.
Refolk turns this list into a search. Ask for “Python RAG maintainers hiring” or “Python engineers shipping RAG in 2025” and Refolk returns a ranked shortlist with the commits, profiles, and projects behind each name.
How this list is built
Last refreshed: Sun, 21 Jun 2026 07:06:56 GMT
Need a more specific search?
Refolk runs natural-language searches across GitHub, LinkedIn, and the open web. Try one of these:
Related lists
- Python · Machine learning
- Python · Deep learning
- Python · Computer vision
- Python · Natural language processing
- Python · LLM
- Python · AI agents
- Python · Embeddings
- Python · Transformers
See all repository lists.