Refolk

Top Embeddings repositories on GitHub

Models, libraries, and infrastructure for vector representations of text and media.

Ranked by stars across 466 repositories tagged embeddings. Refreshed daily.

  1. 1
    supabase/supabase104,579 · ⑂ 12,799

    The Postgres development platform. Supabase gives you a dedicated Postgres database to build your web, mobile, and AI applications.

    • firebase
    • supabase
    • realtime
    • postgrest
    • postgres
    • postgresql
  2. 2
    thedotmack/claude-mem83,460 · ⑂ 7,222

    Persistent Context Across Sessions for Every Agent – Captures everything your agent does during sessions, compresses it with AI, and injects relevant context back into future sessions. Works with Claude Code, OpenClaw, Codex, Gemini, Hermes, Copilot, OpenCode + More

    • ai
    • ai-agents
    • ai-memory
    • anthropic
    • artificial-intelligence
    • claude
  3. 3
    NirDiamant/RAG_Techniques28,080 · ⑂ 3,403

    This repository showcases various advanced techniques for Retrieval-Augmented Generation (RAG) systems. Each technique has a detailed notebook tutorial.

    • rag
    • tutorials
    • langchain
    • llama-index
    • llms
    • python
  4. 4
    Tencent/WeKnora16,505 · ⑂ 2,133

    Open-source LLM knowledge platform: turn raw documents into a queryable RAG, an autonomous reasoning agent, and a self-maintaining Wiki.

    • agent
    • agentic
    • ai
    • golang
    • llm
    • ollama
  5. 5
    neuml/txtai12,673 · ⑂ 835

    💡 All-in-one AI framework for semantic search, LLM orchestration and language model workflows

    • python
    • search
    • nlp
    • semantic-search
    • vector-search
    • txtai
  6. 6
    langchain4j/langchain4j12,378 · ⑂ 2,319

    LangChain4j is an idiomatic, open-source Java library for building LLM-powered applications on the JVM. It offers a unified API over popular LLM providers and vector stores, and makes implementing tool calling (including MCP support), agents and RAG easy. It integrates seamlessly with enterprise Java frameworks like Quarkus and Spring Boot.

    • huggingface
    • java
    • langchain
    • openai
    • chatgpt
    • gpt
  7. 7
    Embedding/Chinese-Word-Vectors12,228 · ⑂ 2,324

    100+ Chinese Word Vectors 上百种预训练中文词向量

    • chinese
    • chinese-word-segmentation
    • embeddings
    • word-embeddings
    • vectors-trained
    • embedding
  8. 8
    RyanCodrai/turbovec12,026 · ⑂ 1,059

    A vector index built on TurboQuant, written in Rust with Python bindings

    • ann
    • avx512
    • embeddings
    • faiss
    • nearest-neighbor
    • neon
  9. 9
    h2oai/h2ogpt11,982 · ⑂ 1,307

    Private chat with local GPT with document, images, video, etc. 100% private, Apache 2.0. Supports oLLaMa, Mixtral, llama.cpp, and more. Demo: https://gpt.h2o.ai/ https://gpt-docs.h2o.ai/

    • chatgpt
    • llm
    • ai
    • embeddings
    • generative
    • gpt
  10. 10
    InsForge/InsForge11,914 · ⑂ 1,014

    The all-in-one, open-source backend platform for agentic coding. InsForge gives your coding agent database, auth, storage, compute, hosting, and AI gateway to ship full-stack apps end-to-end.

    • ai
    • ai-agents
    • coding
    • oauth2
    • postgresql
    • deno
  11. 11
    FlagOpen/FlagEmbedding11,845 · ⑂ 890

    Retrieval and Retrieval-augmented LLMs

    • embeddings
    • information-retrieval
    • llm
    • sentence-embeddings
    • text-semantic-similarity
    • retrieval-augmented-generation
  12. 12
    apache/seatunnel9,414 · ⑂ 2,282

    SeaTunnel is a multimodal, high-performance, distributed, massive data integration tool.

    • data-integration
    • high-performance
    • offline
    • real-time
    • apache
    • batch
  13. 13
    postgresml/postgresml6,802 · ⑂ 362

    Postgres with GPUs for ML/AI apps.

    • ml
    • machine-learning
    • ai
    • ann
    • artificial-intelligence
    • classification
  14. 14
    lance-format/lance6,693 · ⑂ 729

    Open Lakehouse Format for Multimodal AI. Convert from Parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..

    • machine-learning
    • computer-vision
    • data-format
    • deep-learning
    • python
    • apache-arrow
  15. 15

    The easiest way to use deep metric learning in your application. Modular, flexible, and extensible. Written in PyTorch.

    • metric-learning
    • deep-learning
    • computer-vision
    • machine-learning
    • pytorch
    • deep-metric-learning
  16. 16
    Eventual-Inc/Daft5,571 · ⑂ 493

    High-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale

    • machine-learning
    • python
    • data-engineering
    • distributed-computing
    • rust
    • big-data
  17. 17
    MinishLab/semble5,334 · ⑂ 229

    Fast and Accurate Code Search for Agents. Uses ~98% fewer tokens than grep+read

    • agents
    • code-search
    • embeddings
    • mcp
    • mcp-server
    • model-context-protocol
  18. 18
    plastic-labs/honcho5,327 · ⑂ 646

    Memory library for building stateful agents

    • ai
    • llm
    • memory
    • personalization
    • embeddings
    • rag
  19. 19

    Find related notes and excerpts while writing. Your link building copilot displays relevant content in graph + list view. A local embedding model powers semantic search. Zero setup. No API key.

    • chatgpt
    • embeddings
    • claude
    • gemini
    • obsidian
    • obsidian-plugin
  20. 20
    shibing624/text2vec4,970 · ⑂ 428

    text2vec, text to vector. 文本向量表征工具,把文本转化为向量矩阵,实现了Word2Vec、RankBM25、Sentence-BERT、CoSENT等文本表征、文本相似度计算模型,开箱即用。

    • similarity
    • nlp
    • text-similarity
    • text2vec
    • word2vec
    • embeddings
  21. 21

    A blazing fast inference solution for text embeddings models

    • ai
    • embeddings
    • huggingface
    • llm
    • ml
  22. 22
    Marker-Inc-Korea/AutoRAG4,835 · ⑂ 402

    AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation

    • analysis
    • automl
    • benchmarking
    • document-parser
    • embeddings
    • evaluation
  23. 23
    CaviraOSS/OpenMemory4,251 · ⑂ 484

    Local persistent memory store for LLM applications including claude desktop, github copilot, codex, antigravity, etc.

    • ai
    • ai-agents
    • ai-infrastructure
    • ai-memory
    • artificial-intelligence
    • cognitive-architecture
  24. 24
    crmne/ruby_llm4,033 · ⑂ 459

    One delightful Ruby framework for every major AI provider. Build AI agents, chatbots, RAG apps, and multimodal workflows in beautiful, expressive code.

    • llm
    • ruby
    • ai
    • anthropic
    • chatgpt
    • claude
  25. 25
    lightly-ai/lightly3,766 · ⑂ 328

    A python library for self-supervised learning on images.

    • deep-learning
    • self-supervised-learning
    • machine-learning
    • computer-vision
    • pytorch
    • embeddings

Find engineers shipping Embeddings

The list above ranks the most-starred public repositories tagged with the Embeddings topic, drawn from the public GitHub graph. Across 466 repositories tagged this way, the maintainers and top contributors are a tight cluster of the people actually building Embeddings.

Looking for engineers who’ve worked on Embeddings for real, not just listed it on LinkedIn? The fastest path is the contributor list of these repos. Their commits, issues, and READMEs are public proof of depth.

Refolk turns this list into a search. Ask for “maintainers of top Embeddings repos who are hiring”, Embeddings engineers in San Francisco”, or “founders shipping Embeddings” and Refolk returns a ranked shortlist with sources.

How this list is built

Refolk searched GitHub for public repositories tagged with the Embeddings topic, ranked them by stargazer count, and kept those with at least 50 stars. The list refreshes once a day.

Last refreshed: Sun, 21 Jun 2026 08:16:21 GMT

Need a list like this for any search?

Refolk runs natural-language searches across GitHub, LinkedIn, and the open web. Try one of these:

Browse other topics

See all repository lists.

Embeddings by language