Top Embeddings repositories on GitHub
Models, libraries, and infrastructure for vector representations of text and media.
Ranked by stars across 466 repositories tagged embeddings. Refreshed daily.
- 1supabase/supabase★ 104,576 · ⑂ 12,799
The Postgres development platform. Supabase gives you a dedicated Postgres database to build your web, mobile, and AI applications.
- firebase
- supabase
- realtime
- postgrest
- postgres
- postgresql
- 2thedotmack/claude-mem★ 83,453 · ⑂ 7,221
Persistent Context Across Sessions for Every Agent – Captures everything your agent does during sessions, compresses it with AI, and injects relevant context back into future sessions. Works with Claude Code, OpenClaw, Codex, Gemini, Hermes, Copilot, OpenCode + More
- ai
- ai-agents
- ai-memory
- anthropic
- artificial-intelligence
- claude
- 3NirDiamant/RAG_Techniques★ 28,079 · ⑂ 3,403
This repository showcases various advanced techniques for Retrieval-Augmented Generation (RAG) systems. Each technique has a detailed notebook tutorial.
- rag
- tutorials
- langchain
- llama-index
- llms
- python
- 4Tencent/WeKnora★ 16,504 · ⑂ 2,132
Open-source LLM knowledge platform: turn raw documents into a queryable RAG, an autonomous reasoning agent, and a self-maintaining Wiki.
- agent
- agentic
- ai
- golang
- llm
- ollama
- 5neuml/txtai★ 12,673 · ⑂ 835
💡 All-in-one AI framework for semantic search, LLM orchestration and language model workflows
- python
- search
- nlp
- semantic-search
- vector-search
- txtai
- 6langchain4j/langchain4j★ 12,377 · ⑂ 2,319
LangChain4j is an idiomatic, open-source Java library for building LLM-powered applications on the JVM. It offers a unified API over popular LLM providers and vector stores, and makes implementing tool calling (including MCP support), agents and RAG easy. It integrates seamlessly with enterprise Java frameworks like Quarkus and Spring Boot.
- huggingface
- java
- langchain
- openai
- chatgpt
- gpt
- 7Embedding/Chinese-Word-Vectors★ 12,228 · ⑂ 2,325
100+ Chinese Word Vectors 上百种预训练中文词向量
- chinese
- chinese-word-segmentation
- embeddings
- word-embeddings
- vectors-trained
- embedding
- 8RyanCodrai/turbovec★ 12,023 · ⑂ 1,060
A vector index built on TurboQuant, written in Rust with Python bindings
- ann
- avx512
- embeddings
- faiss
- nearest-neighbor
- neon
- 9h2oai/h2ogpt★ 11,982 · ⑂ 1,307
Private chat with local GPT with document, images, video, etc. 100% private, Apache 2.0. Supports oLLaMa, Mixtral, llama.cpp, and more. Demo: https://gpt.h2o.ai/ https://gpt-docs.h2o.ai/
- chatgpt
- llm
- ai
- embeddings
- generative
- gpt
- 10InsForge/InsForge★ 11,912 · ⑂ 1,014
The all-in-one, open-source backend platform for agentic coding. InsForge gives your coding agent database, auth, storage, compute, hosting, and AI gateway to ship full-stack apps end-to-end.
- ai
- ai-agents
- coding
- oauth2
- postgresql
- deno
- 11FlagOpen/FlagEmbedding★ 11,845 · ⑂ 890
Retrieval and Retrieval-augmented LLMs
- embeddings
- information-retrieval
- llm
- sentence-embeddings
- text-semantic-similarity
- retrieval-augmented-generation
- 12apache/seatunnel★ 9,414 · ⑂ 2,282
SeaTunnel is a multimodal, high-performance, distributed, massive data integration tool.
- data-integration
- high-performance
- offline
- real-time
- apache
- batch
- 13postgresml/postgresml★ 6,802 · ⑂ 362
Postgres with GPUs for ML/AI apps.
- ml
- machine-learning
- ai
- ann
- artificial-intelligence
- classification
- 14lance-format/lance★ 6,693 · ⑂ 729
Open Lakehouse Format for Multimodal AI. Convert from Parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..
- machine-learning
- computer-vision
- data-format
- deep-learning
- python
- apache-arrow
- 15KevinMusgrave/pytorch-metric-learning★ 6,327 · ⑂ 659
The easiest way to use deep metric learning in your application. Modular, flexible, and extensible. Written in PyTorch.
- metric-learning
- deep-learning
- computer-vision
- machine-learning
- pytorch
- deep-metric-learning
- 16Eventual-Inc/Daft★ 5,571 · ⑂ 493
High-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale
- machine-learning
- python
- data-engineering
- distributed-computing
- rust
- big-data
- 17MinishLab/semble★ 5,331 · ⑂ 229
Fast and Accurate Code Search for Agents. Uses ~98% fewer tokens than grep+read
- agents
- code-search
- embeddings
- mcp
- mcp-server
- model-context-protocol
- 18plastic-labs/honcho★ 5,324 · ⑂ 646
Memory library for building stateful agents
- ai
- llm
- memory
- personalization
- embeddings
- rag
- 19brianpetro/obsidian-smart-connections★ 5,186 · ⑂ 321
Find related notes and excerpts while writing. Your link building copilot displays relevant content in graph + list view. A local embedding model powers semantic search. Zero setup. No API key.
- chatgpt
- embeddings
- claude
- gemini
- obsidian
- obsidian-plugin
- 20shibing624/text2vec★ 4,970 · ⑂ 428
text2vec, text to vector. 文本向量表征工具,把文本转化为向量矩阵,实现了Word2Vec、RankBM25、Sentence-BERT、CoSENT等文本表征、文本相似度计算模型,开箱即用。
- similarity
- nlp
- text-similarity
- text2vec
- word2vec
- embeddings
- 21huggingface/text-embeddings-inference★ 4,881 · ⑂ 400
A blazing fast inference solution for text embeddings models
- ai
- embeddings
- huggingface
- llm
- ml
- 22Marker-Inc-Korea/AutoRAG★ 4,835 · ⑂ 402
AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation
- analysis
- automl
- benchmarking
- document-parser
- embeddings
- evaluation
- 23CaviraOSS/OpenMemory★ 4,251 · ⑂ 484
Local persistent memory store for LLM applications including claude desktop, github copilot, codex, antigravity, etc.
- ai
- ai-agents
- ai-infrastructure
- ai-memory
- artificial-intelligence
- cognitive-architecture
- 24crmne/ruby_llm★ 4,033 · ⑂ 459
One delightful Ruby framework for every major AI provider. Build AI agents, chatbots, RAG apps, and multimodal workflows in beautiful, expressive code.
- llm
- ruby
- ai
- anthropic
- chatgpt
- claude
- 25lightly-ai/lightly★ 3,766 · ⑂ 328
A python library for self-supervised learning on images.
- deep-learning
- self-supervised-learning
- machine-learning
- computer-vision
- pytorch
- embeddings
Find engineers shipping Embeddings
The list above ranks the most-starred public repositories tagged with the Embeddings topic, drawn from the public GitHub graph. Across 466 repositories tagged this way, the maintainers and top contributors are a tight cluster of the people actually building Embeddings.
Looking for engineers who’ve worked on Embeddings for real, not just listed it on LinkedIn? The fastest path is the contributor list of these repos. Their commits, issues, and READMEs are public proof of depth.
Refolk turns this list into a search. Ask for “maintainers of top Embeddings repos who are hiring”, “Embeddings engineers in San Francisco”, or “founders shipping Embeddings” and Refolk returns a ranked shortlist with sources.
How this list is built
Last refreshed: Sun, 21 Jun 2026 07:11:18 GMT
Need a list like this for any search?
Refolk runs natural-language searches across GitHub, LinkedIn, and the open web. Try one of these:
Browse other topics
- Top AI agents repos
- Top RAG repos
- Top Deep learning repos
- Top Fine-tuning repos
- Top Natural language processing repos
- Top Observability repos
- Top Text-to-speech repos
- Top Security repos
See all repository lists.