Top Python RAG repositories on GitHub
Retrieval-augmented generation pipelines, embeddings, and grounding tooling. Filtered to projects whose primary language is Python.
Ranked by stars across 790 Python repositories tagged rag. Refreshed daily.
- 1langchain-ai/langchain★ 135,982 · ⑂ 22,480
The agent engineering platform. Available in TypeScript!
- ai
- anthropic
- gemini
- langchain
- llm
- openai
- 2open-webui/open-webui★ 135,830 · ⑂ 19,340
User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
- ollama
- ollama-webui
- llm
- webui
- self-hosted
- llm-ui
- 3Shubhamsaboo/awesome-llm-apps★ 109,087 · ⑂ 16,139
100+ AI Agent & RAG apps you can actually run — clone, customize, ship.
- llms
- rag
- python
- agents
- 4infiniflow/ragflow★ 79,858 · ⑂ 9,089
RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs
- ai
- ai-agents
- context-engine
- llm-apps
- rag
- retrieval-augmented-generation
- 5PaddlePaddle/PaddleOCR★ 77,196 · ⑂ 10,373
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
- ocr
- chineseocr
- pdf2markdown
- pp-ocr
- pp-structure
- document-parsing
- 6
- 7run-llama/llama_index★ 49,181 · ⑂ 7,365
LlamaIndex is the leading document agent and OCR platform
- agents
- application
- data
- fine-tuning
- framework
- llamaindex
- 8safishamsi/graphify★ 44,007 · ⑂ 4,794
AI coding assistant skill (Claude Code, Codex, OpenCode, Cursor, Gemini CLI, and more). Turn any folder of code, SQL schemas, R scripts, shell scripts, docs, papers, images, or videos into a queryable knowledge graph. App code + database schema + infrastructure in one graph.
- claude-code
- graphrag
- knowledge-graph
- codex
- openclaw
- skills
- 9
- 10QuivrHQ/quivr★ 39,134 · ⑂ 3,752
Opiniated RAG for integrating GenAI in your apps 🧠 Focus on your product rather than the RAG. Easy integration in existing products with customisation! Any LLM: GPT4, Groq, Llama. Any Vectorstore: PGVector, Faiss. Any Files. Anyway you want.
- ai
- llm
- api
- chatbot
- chatgpt
- database
- 11mindsdb/mindsdb★ 39,122 · ⑂ 6,199
AI Data Vault - A query engine for AI Agents to securely query data from any datasource
- ai
- artificial-inteligence
- databases
- llms
- rag
- agents
- 12chatchat-space/Langchain-Chatchat★ 37,967 · ⑂ 6,197
Langchain-Chatchat(原Langchain-ChatGLM)基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like ChatGLM, Qwen and Llama) RAG and Agent app with langchain
- chatglm
- langchain
- llm
- knowledge-base
- llama
- chatbot
- 13HKUDS/LightRAG★ 34,834 · ⑂ 4,934
[EMNLP2025] "LightRAG: Simple and Fast Retrieval-Augmented Generation"
- knowledge-graph
- large-language-models
- retrieval-augmented-generation
- genai
- graphrag
- llm
- 14khoj-ai/khoj★ 34,417 · ⑂ 2,185
Your AI second brain. Self-hostable. Get answers from the web or your docs. Build custom agents, schedule automations, do deep research. Turn any online or local LLM into your personal, autonomous AI (gpt, claude, gemini, llama, qwen, mistral). Get started - free.
- semantic-search
- emacs
- obsidian-md
- chat
- chatgpt
- ai
- 15ZhuLinsen/daily_stock_analysis★ 34,311 · ⑂ 33,993
LLM驱动的 A/H/美股智能分析器:多数据源行情 + 实时新闻 + LLM决策仪表盘 + 多渠道推送,零成本定时运行,纯白嫖. LLM-powered stock analysis system for A/H/US markets.
- ai
- aigc
- gemini
- llm
- quant
- stock
- 16microsoft/graphrag★ 32,812 · ⑂ 3,476
A modular graph-based Retrieval-Augmented Generation (RAG) system
- graphrag
- rag
- llm
- llms
- gpt
- gpt-4
- 17langchain-ai/langgraph★ 31,366 · ⑂ 5,339
Build resilient language agents as graphs. Available in TypeScript!
- agents
- ai
- ai-agents
- chatgpt
- deepagents
- enterprise
- 18onyx-dot-app/onyx★ 29,101 · ⑂ 3,915
Open Source AI Platform - AI Chat with advanced features that works with every LLM
- enterprise-search
- rag
- ai-chat
- chatgpt
- gen-ai
- nextjs
- 19VectifyAI/PageIndex★ 28,975 · ⑂ 2,462
📑 PageIndex: Document Index for Vectorless, Reasoning-based RAG
- agentic-ai
- agents
- ai
- ai-agents
- context-engineering
- llm
- 20
- 21Cinnamon/kotaemon★ 25,365 · ⑂ 2,121
An open-source RAG-based tool for chatting with your documents.
- chatbot
- llms
- open-source
- rag
- 22ScrapeGraphAI/Scrapegraph-ai★ 24,466 · ⑂ 2,184
Python scraper based on AI
- scraping
- scraping-python
- llm
- web-crawler
- web-scraping
- ai-scraping
- 23volcengine/OpenViking★ 23,562 · ⑂ 1,743
OpenViking is an open-source context database designed specifically for AI Agents(such as openclaw). OpenViking unifies the management of context (memory, resources, and skills) that Agents need through a file system paradigm, enabling hierarchical context delivery and self-evolving.
- context-engineering
- filesystem
- rag
- memory
- skill
- agent
- 24HKUDS/DeepTutor★ 23,523 · ⑂ 3,123
"DeepTutor: Agent-Native Personalized Learning Assistant"
- ai-tutor
- deepresearch
- interactive-learning
- large-language-models
- multi-agent-systems
- rag
- 25vanna-ai/vanna★ 23,390 · ⑂ 2,366
🤖 Chat with your SQL database 📊. Accurate Text-to-SQL Generation via LLMs using Agentic Retrieval 🔄.
- agent
- ai
- data-visualization
- database
- llm
- sql
Find Python engineers shipping RAG
The list above ranks the most-starred public Python repositories tagged with the RAG topic, drawn from the public GitHub graph. Across 790 matching repositories, the contributors are a tight cluster of engineers with both Python chops and real RAG experience.
That overlap is rare. Most Python engineers haven’t shipped RAG, and most RAG maintainers don’t write Python. The people on this list’s contributor graph are the ones who do both.
Refolk turns this list into a search. Ask for “Python RAG maintainers hiring” or “Python engineers shipping RAG in 2025” and Refolk returns a ranked shortlist with the commits, profiles, and projects behind each name.
How this list is built
Last refreshed: Thu, 07 May 2026 05:54:20 GMT
Need a more specific search?
Refolk runs natural-language searches across GitHub, LinkedIn, and the open web. Try one of these:
Related lists
- Python · Machine learning
- Python · Deep learning
- Python · Computer vision
- Python · Natural language processing
- Python · LLM
- Python · AI agents
- Python · Embeddings
- Python · Transformers
See all repository lists.