Refolk

Top Python RAG repositories on GitHub

Retrieval-augmented generation pipelines, embeddings, and grounding tooling. Filtered to projects whose primary language is Python.

Ranked by stars across 866 Python repositories tagged rag. Refreshed daily.

  1. 1
    open-webui/open-webui142,468 · ⑂ 20,486

    User-friendly AI Interface (Supports Ollama, OpenAI API, ...)

    • ollama
    • ollama-webui
    • llm
    • webui
    • self-hosted
    • llm-ui
  2. 2
    langchain-ai/langchain139,782 · ⑂ 23,182

    The agent engineering platform.

    • ai
    • anthropic
    • gemini
    • langchain
    • llm
    • openai
  3. 3
    Shubhamsaboo/awesome-llm-apps115,190 · ⑂ 17,107

    100+ AI Agent & RAG apps you can actually run — clone, customize, ship.

    • llms
    • rag
    • python
    • agents
  4. 4
    infiniflow/ragflow83,265 · ⑂ 9,638

    RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs

    • ai
    • ai-agents
    • context-engine
    • llm-apps
    • rag
    • retrieval-augmented-generation
  5. 5
    PaddlePaddle/PaddleOCR83,159 · ⑂ 10,830

    Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

    • ocr
    • chineseocr
    • pdf2markdown
    • pp-ocr
    • pp-structure
    • document-parsing
  6. 6
    safishamsi/graphify70,019 · ⑂ 7,033

    AI coding assistant skill (Claude Code, Codex, OpenCode, Cursor, Gemini CLI, and more). Turn any folder of code, SQL schemas, R scripts, shell scripts, docs, papers, images, or videos into a queryable knowledge graph. App code + database schema + infrastructure in one graph.

    • claude-code
    • graphrag
    • knowledge-graph
    • codex
    • openclaw
    • skills
  7. 7
    datawhalechina/hello-agents60,604 · ⑂ 7,469

    📚 《从零开始构建智能体》——从零开始的智能体原理与实践教程

    • agent
    • tutorial
    • llm
    • rag
  8. 8
    mem0ai/mem059,015 · ⑂ 6,808

    Universal memory layer for AI Agents

    • ai
    • chatgpt
    • llm
    • python
    • chatbots
    • rag
  9. 9
    run-llama/llama_index50,249 · ⑂ 7,597

    LlamaIndex is the leading document agent and OCR platform

    • agents
    • application
    • data
    • fine-tuning
    • framework
    • llamaindex
  10. 10
    chopratejas/headroom42,490 · ⑂ 2,929

    Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server.

    • agent
    • ai
    • anthropic
    • compression
    • context-engineering
    • context-window
  11. 11
    QuivrHQ/quivr39,163 · ⑂ 3,723

    Opiniated RAG for integrating GenAI in your apps 🧠 Focus on your product rather than the RAG. Easy integration in existing products with customisation! Any LLM: GPT4, Groq, Llama. Any Vectorstore: PGVector, Faiss. Any Files. Anyway you want.

    • ai
    • llm
    • api
    • chatbot
    • chatgpt
    • database
  12. 12
    chatchat-space/Langchain-Chatchat38,200 · ⑂ 6,215

    Langchain-Chatchat(原Langchain-ChatGLM)基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like ChatGLM, Qwen and Llama) RAG and Agent app with langchain

    • chatglm
    • langchain
    • llm
    • knowledge-base
    • llama
    • chatbot
  13. 13
    HKUDS/LightRAG36,817 · ⑂ 5,192

    [EMNLP2025] "LightRAG: Simple and Fast Retrieval-Augmented Generation"

    • knowledge-graph
    • large-language-models
    • retrieval-augmented-generation
    • genai
    • graphrag
    • llm
  14. 14
    langchain-ai/langgraph35,321 · ⑂ 5,924

    Build resilient agents.

    • agents
    • ai
    • ai-agents
    • chatgpt
    • deepagents
    • enterprise
  15. 15
    khoj-ai/khoj35,226 · ⑂ 2,256

    Your AI second brain. Self-hostable. Get answers from the web or your docs. Build custom agents, schedule automations, do deep research. Turn any online or local LLM into your personal, autonomous AI (gpt, claude, gemini, llama, qwen, mistral). Get started - free.

    • semantic-search
    • emacs
    • obsidian-md
    • chat
    • chatgpt
    • ai
  16. 16
    microsoft/graphrag33,885 · ⑂ 3,592

    A modular graph-based Retrieval-Augmented Generation (RAG) system

    • graphrag
    • rag
    • llm
    • llms
    • gpt
    • gpt-4
  17. 17
    VectifyAI/PageIndex33,260 · ⑂ 2,894

    📑 PageIndex: Document Index for Vectorless, Reasoning-based RAG

    • agentic-ai
    • agents
    • ai
    • ai-agents
    • context-engineering
    • llm
  18. 18
    onyx-dot-app/onyx30,457 · ⑂ 4,175

    Open Source AI Platform - AI Chat with advanced features that works with every LLM

    • enterprise-search
    • rag
    • ai-chat
    • chatgpt
    • gen-ai
    • nextjs
  19. 19
    getzep/graphiti27,676 · ⑂ 2,773

    Build Real-Time Knowledge Graphs for AI Agents

    • agents
    • graph
    • llms
    • rag
  20. 20
    ScrapeGraphAI/Scrapegraph-ai27,380 · ⑂ 2,589

    Python scraper based on AI

    • scraping
    • scraping-python
    • llm
    • web-crawler
    • web-scraping
    • ai-scraping
  21. 21
    volcengine/OpenViking25,856 · ⑂ 2,001

    OpenViking is an open-source context database designed specifically for AI Agents(such as openclaw). OpenViking unifies the management of context (memory, resources, and skills) that Agents need through a file system paradigm, enabling hierarchical context delivery and self-evolving.

    • context-engineering
    • filesystem
    • rag
    • memory
    • skill
    • agent
  22. 22
    Cinnamon/kotaemon25,478 · ⑂ 2,123

    An open-source RAG-based tool for chatting with your documents.

    • chatbot
    • llms
    • open-source
    • rag
  23. 23
    HKUDS/DeepTutor24,860 · ⑂ 3,360

    DeepTutor: Agent-native Personalized Tutoring. https://deeptutor.info/.

    • ai-tutor
    • deepresearch
    • interactive-learning
    • large-language-models
    • multi-agent-systems
    • rag
  24. 24
    vanna-ai/vanna23,653 · ⑂ 2,429

    🤖 Chat with your SQL database 📊. Accurate Text-to-SQL Generation via LLMs using Agentic Retrieval 🔄.

    • agent
    • ai
    • data-visualization
    • database
    • llm
    • sql
  25. 25
    1Panel-dev/MaxKB21,372 · ⑂ 2,903

    🔥 MaxKB is an open-source platform for building enterprise-grade agents. 强大易用的开源企业级智能体平台。

    • llm
    • ollama
    • maxkb
    • knowledgebase
    • chatbot
    • langchain

Find Python engineers shipping RAG

The list above ranks the most-starred public Python repositories tagged with the RAG topic, drawn from the public GitHub graph. Across 866 matching repositories, the contributors are a tight cluster of engineers with both Python chops and real RAG experience.

That overlap is rare. Most Python engineers haven’t shipped RAG, and most RAG maintainers don’t write Python. The people on this list’s contributor graph are the ones who do both.

Refolk turns this list into a search. Ask for Python RAG maintainers hiring” or Python engineers shipping RAG in 2025” and Refolk returns a ranked shortlist with the commits, profiles, and projects behind each name.

How this list is built

Refolk searched GitHub for public Python repositories tagged with the RAG topic, ranked them by stargazer count, and kept those with at least 25 stars. The list refreshes once a day.

Last refreshed: Sun, 21 Jun 2026 08:13:57 GMT

Need a more specific search?

Refolk runs natural-language searches across GitHub, LinkedIn, and the open web. Try one of these:

Related lists

See all repository lists.

Or zoom out