Refolk

Top Python RAG repositories on GitHub

Retrieval-augmented generation pipelines, embeddings, and grounding tooling. Filtered to projects whose primary language is Python.

Ranked by stars across 790 Python repositories tagged rag. Refreshed daily.

  1. 1
    langchain-ai/langchain135,982 · ⑂ 22,480

    The agent engineering platform. Available in TypeScript!

    • ai
    • anthropic
    • gemini
    • langchain
    • llm
    • openai
  2. 2
    open-webui/open-webui135,830 · ⑂ 19,340

    User-friendly AI Interface (Supports Ollama, OpenAI API, ...)

    • ollama
    • ollama-webui
    • llm
    • webui
    • self-hosted
    • llm-ui
  3. 3
    Shubhamsaboo/awesome-llm-apps109,087 · ⑂ 16,139

    100+ AI Agent & RAG apps you can actually run — clone, customize, ship.

    • llms
    • rag
    • python
    • agents
  4. 4
    infiniflow/ragflow79,858 · ⑂ 9,089

    RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs

    • ai
    • ai-agents
    • context-engine
    • llm-apps
    • rag
    • retrieval-augmented-generation
  5. 5
    PaddlePaddle/PaddleOCR77,196 · ⑂ 10,373

    Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

    • ocr
    • chineseocr
    • pdf2markdown
    • pp-ocr
    • pp-structure
    • document-parsing
  6. 6
    mem0ai/mem054,963 · ⑂ 6,226

    Universal memory layer for AI Agents

    • ai
    • chatgpt
    • llm
    • python
    • chatbots
    • rag
  7. 7
    run-llama/llama_index49,181 · ⑂ 7,365

    LlamaIndex is the leading document agent and OCR platform

    • agents
    • application
    • data
    • fine-tuning
    • framework
    • llamaindex
  8. 8
    safishamsi/graphify44,007 · ⑂ 4,794

    AI coding assistant skill (Claude Code, Codex, OpenCode, Cursor, Gemini CLI, and more). Turn any folder of code, SQL schemas, R scripts, shell scripts, docs, papers, images, or videos into a queryable knowledge graph. App code + database schema + infrastructure in one graph.

    • claude-code
    • graphrag
    • knowledge-graph
    • codex
    • openclaw
    • skills
  9. 9
    datawhalechina/hello-agents43,201 · ⑂ 5,252

    📚 《从零开始构建智能体》——从零开始的智能体原理与实践教程

    • agent
    • tutorial
    • llm
    • rag
  10. 10
    QuivrHQ/quivr39,134 · ⑂ 3,752

    Opiniated RAG for integrating GenAI in your apps 🧠 Focus on your product rather than the RAG. Easy integration in existing products with customisation! Any LLM: GPT4, Groq, Llama. Any Vectorstore: PGVector, Faiss. Any Files. Anyway you want.

    • ai
    • llm
    • api
    • chatbot
    • chatgpt
    • database
  11. 11
    mindsdb/mindsdb39,122 · ⑂ 6,199

    AI Data Vault - A query engine for AI Agents to securely query data from any datasource

    • ai
    • artificial-inteligence
    • databases
    • llms
    • rag
    • agents
  12. 12
    chatchat-space/Langchain-Chatchat37,967 · ⑂ 6,197

    Langchain-Chatchat(原Langchain-ChatGLM)基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like ChatGLM, Qwen and Llama) RAG and Agent app with langchain

    • chatglm
    • langchain
    • llm
    • knowledge-base
    • llama
    • chatbot
  13. 13
    HKUDS/LightRAG34,834 · ⑂ 4,934

    [EMNLP2025] "LightRAG: Simple and Fast Retrieval-Augmented Generation"

    • knowledge-graph
    • large-language-models
    • retrieval-augmented-generation
    • genai
    • graphrag
    • llm
  14. 14
    khoj-ai/khoj34,417 · ⑂ 2,185

    Your AI second brain. Self-hostable. Get answers from the web or your docs. Build custom agents, schedule automations, do deep research. Turn any online or local LLM into your personal, autonomous AI (gpt, claude, gemini, llama, qwen, mistral). Get started - free.

    • semantic-search
    • emacs
    • obsidian-md
    • chat
    • chatgpt
    • ai
  15. 15
    ZhuLinsen/daily_stock_analysis34,311 · ⑂ 33,993

    LLM驱动的 A/H/美股智能分析器:多数据源行情 + 实时新闻 + LLM决策仪表盘 + 多渠道推送,零成本定时运行,纯白嫖. LLM-powered stock analysis system for A/H/US markets.

    • ai
    • aigc
    • gemini
    • llm
    • quant
    • stock
  16. 16
    microsoft/graphrag32,812 · ⑂ 3,476

    A modular graph-based Retrieval-Augmented Generation (RAG) system

    • graphrag
    • rag
    • llm
    • llms
    • gpt
    • gpt-4
  17. 17
    langchain-ai/langgraph31,366 · ⑂ 5,339

    Build resilient language agents as graphs. Available in TypeScript!

    • agents
    • ai
    • ai-agents
    • chatgpt
    • deepagents
    • enterprise
  18. 18
    onyx-dot-app/onyx29,101 · ⑂ 3,915

    Open Source AI Platform - AI Chat with advanced features that works with every LLM

    • enterprise-search
    • rag
    • ai-chat
    • chatgpt
    • gen-ai
    • nextjs
  19. 19
    VectifyAI/PageIndex28,975 · ⑂ 2,462

    📑 PageIndex: Document Index for Vectorless, Reasoning-based RAG

    • agentic-ai
    • agents
    • ai
    • ai-agents
    • context-engineering
    • llm
  20. 20
    getzep/graphiti25,776 · ⑂ 2,562

    Build Real-Time Knowledge Graphs for AI Agents

    • agents
    • graph
    • llms
    • rag
  21. 21
    Cinnamon/kotaemon25,365 · ⑂ 2,121

    An open-source RAG-based tool for chatting with your documents.

    • chatbot
    • llms
    • open-source
    • rag
  22. 22
    ScrapeGraphAI/Scrapegraph-ai24,466 · ⑂ 2,184

    Python scraper based on AI

    • scraping
    • scraping-python
    • llm
    • web-crawler
    • web-scraping
    • ai-scraping
  23. 23
    volcengine/OpenViking23,562 · ⑂ 1,743

    OpenViking is an open-source context database designed specifically for AI Agents(such as openclaw). OpenViking unifies the management of context (memory, resources, and skills) that Agents need through a file system paradigm, enabling hierarchical context delivery and self-evolving.

    • context-engineering
    • filesystem
    • rag
    • memory
    • skill
    • agent
  24. 24
    HKUDS/DeepTutor23,523 · ⑂ 3,123

    "DeepTutor: Agent-Native Personalized Learning Assistant"

    • ai-tutor
    • deepresearch
    • interactive-learning
    • large-language-models
    • multi-agent-systems
    • rag
  25. 25
    vanna-ai/vanna23,390 · ⑂ 2,366

    🤖 Chat with your SQL database 📊. Accurate Text-to-SQL Generation via LLMs using Agentic Retrieval 🔄.

    • agent
    • ai
    • data-visualization
    • database
    • llm
    • sql

Find Python engineers shipping RAG

The list above ranks the most-starred public Python repositories tagged with the RAG topic, drawn from the public GitHub graph. Across 790 matching repositories, the contributors are a tight cluster of engineers with both Python chops and real RAG experience.

That overlap is rare. Most Python engineers haven’t shipped RAG, and most RAG maintainers don’t write Python. The people on this list’s contributor graph are the ones who do both.

Refolk turns this list into a search. Ask for Python RAG maintainers hiring” or Python engineers shipping RAG in 2025” and Refolk returns a ranked shortlist with the commits, profiles, and projects behind each name.

How this list is built

Refolk searched GitHub for public Python repositories tagged with the RAG topic, ranked them by stargazer count, and kept those with at least 25 stars. The list refreshes once a day.

Last refreshed: Thu, 07 May 2026 05:54:20 GMT

Need a more specific search?

Refolk runs natural-language searches across GitHub, LinkedIn, and the open web. Try one of these:

Related lists

See all repository lists.

Or zoom out