Top Embeddings repositories on GitHub
Models, libraries, and infrastructure for vector representations of text and media.
Ranked by stars across 442 repositories tagged embeddings. Refreshed daily.
- 1supabase/supabase★ 101,968 · ⑂ 12,318
The Postgres development platform. Supabase gives you a dedicated Postgres database to build your web, mobile, and AI applications.
- firebase
- supabase
- realtime
- postgrest
- postgres
- postgresql
- 2thedotmack/claude-mem★ 73,048 · ⑂ 6,271
A Claude Code plugin that automatically captures everything Claude does during your coding sessions, compresses it with AI (using Claude's agent-sdk), and injects relevant context back into future sessions.
- ai
- ai-agents
- ai-memory
- anthropic
- artificial-intelligence
- claude
- 3NirDiamant/RAG_Techniques★ 27,164 · ⑂ 3,267
This repository showcases various advanced techniques for Retrieval-Augmented Generation (RAG) systems. Each technique has a detailed notebook tutorial.
- rag
- tutorials
- langchain
- llama-index
- llms
- python
- 4Tencent/WeKnora★ 14,305 · ⑂ 1,741
Open-source LLM knowledge platform: turn raw documents into a queryable RAG, an autonomous reasoning agent, and a self-maintaining Wiki.
- agent
- agentic
- ai
- golang
- llm
- ollama
- 5neuml/txtai★ 12,471 · ⑂ 808
💡 All-in-one AI framework for semantic search, LLM orchestration and language model workflows
- python
- search
- nlp
- semantic-search
- vector-search
- txtai
- 6Embedding/Chinese-Word-Vectors★ 12,215 · ⑂ 2,327
100+ Chinese Word Vectors 上百种预训练中文词向量
- chinese
- chinese-word-segmentation
- embeddings
- word-embeddings
- vectors-trained
- embedding
- 7h2oai/h2ogpt★ 11,988 · ⑂ 1,313
Private chat with local GPT with document, images, video, etc. 100% private, Apache 2.0. Supports oLLaMa, Mixtral, llama.cpp, and more. Demo: https://gpt.h2o.ai/ https://gpt-docs.h2o.ai/
- chatgpt
- llm
- ai
- embeddings
- generative
- gpt
- 8langchain4j/langchain4j★ 11,870 · ⑂ 2,200
LangChain4j is an idiomatic, open-source Java library for building LLM-powered applications on the JVM. It offers a unified API over popular LLM providers and vector stores, and makes implementing tool calling (including MCP support), agents and RAG easy. It integrates seamlessly with enterprise Java frameworks like Quarkus and Spring Boot.
- huggingface
- java
- langchain
- openai
- chatgpt
- gpt
- 9FlagOpen/FlagEmbedding★ 11,647 · ⑂ 870
Retrieval and Retrieval-augmented LLMs
- embeddings
- information-retrieval
- llm
- sentence-embeddings
- text-semantic-similarity
- retrieval-augmented-generation
- 10apache/seatunnel★ 9,312 · ⑂ 2,231
SeaTunnel is a multimodal, high-performance, distributed, massive data integration tool.
- data-integration
- high-performance
- offline
- real-time
- apache
- batch
- 11InsForge/InsForge★ 8,566 · ⑂ 708
InsForge is a Postgres-based backend with auth, storage, compute, hosting, and AI gateway. Built for coding agents.
- ai
- ai-agents
- coding
- oauth2
- postgresql
- deno
- 12postgresml/postgresml★ 6,782 · ⑂ 361
Postgres with GPUs for ML/AI apps.
- ml
- machine-learning
- ai
- ann
- artificial-intelligence
- classification
- 13lance-format/lance★ 6,390 · ⑂ 655
Open Lakehouse Format for Multimodal AI. Convert from Parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..
- machine-learning
- computer-vision
- data-format
- deep-learning
- python
- apache-arrow
- 14KevinMusgrave/pytorch-metric-learning★ 6,321 · ⑂ 660
The easiest way to use deep metric learning in your application. Modular, flexible, and extensible. Written in PyTorch.
- metric-learning
- deep-learning
- computer-vision
- machine-learning
- pytorch
- deep-metric-learning
- 15Eventual-Inc/Daft★ 5,454 · ⑂ 462
High-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale
- machine-learning
- python
- data-engineering
- distributed-computing
- rust
- big-data
- 16shibing624/text2vec★ 4,962 · ⑂ 427
text2vec, text to vector. 文本向量表征工具,把文本转化为向量矩阵,实现了Word2Vec、RankBM25、Sentence-BERT、CoSENT等文本表征、文本相似度计算模型,开箱即用。
- similarity
- nlp
- text-similarity
- text2vec
- word2vec
- embeddings
- 17brianpetro/obsidian-smart-connections★ 4,958 · ⑂ 311
Chat with your notes & see links to related content with AI embeddings. Use local models or 100+ via APIs like Claude, Gemini, ChatGPT & Llama 3
- chatgpt
- embeddings
- claude
- gemini
- llama3
- obsidian
- 18huggingface/text-embeddings-inference★ 4,774 · ⑂ 386
A blazing fast inference solution for text embeddings models
- ai
- embeddings
- huggingface
- llm
- ml
- 19Marker-Inc-Korea/AutoRAG★ 4,749 · ⑂ 397
AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation
- analysis
- automl
- benchmarking
- document-parser
- embeddings
- evaluation
- 20CaviraOSS/OpenMemory★ 4,067 · ⑂ 465
Local persistent memory store for LLM applications including claude desktop, github copilot, codex, antigravity, etc.
- ai
- ai-agents
- ai-infrastructure
- ai-memory
- artificial-intelligence
- cognitive-architecture
- 21crmne/ruby_llm★ 3,901 · ⑂ 438
One beautiful Ruby API for OpenAI, Anthropic, Gemini, Bedrock, Azure, OpenRouter, DeepSeek, Ollama, VertexAI, Perplexity, Mistral, xAI, GPUStack & OpenAI compatible APIs. Agents, Chat, Vision, Audio, PDF, Images, Embeddings, Tools, Streaming & Rails integration.
- llm
- ruby
- ai
- anthropic
- chatgpt
- claude
- 22lightly-ai/lightly★ 3,734 · ⑂ 326
A python library for self-supervised learning on images.
- deep-learning
- self-supervised-learning
- machine-learning
- computer-vision
- pytorch
- embeddings
- 23tensorflow/hub★ 3,522 · ⑂ 1,646
A library for transfer learning by reusing parts of TensorFlow models.
- tensorflow
- machine-learning
- transfer-learning
- embeddings
- image-classification
- python
- 24towhee-io/towhee★ 3,446 · ⑂ 260
Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
- machine-learning
- convolutional-networks
- embedding-vectors
- embeddings
- computer-vision
- image-processing
- 25filipecalegario/awesome-generative-ai★ 3,437 · ⑂ 759
A curated list of Generative AI tools, works, models, and references
- awesome-list
- awesome
- dall-e
- dalle2
- midjourney
- prompt-engineering
Find engineers shipping Embeddings
The list above ranks the most-starred public repositories tagged with the Embeddings topic, drawn from the public GitHub graph. Across 442 repositories tagged this way, the maintainers and top contributors are a tight cluster of the people actually building Embeddings.
Looking for engineers who’ve worked on Embeddings for real, not just listed it on LinkedIn? The fastest path is the contributor list of these repos. Their commits, issues, and READMEs are public proof of depth.
Refolk turns this list into a search. Ask for “maintainers of top Embeddings repos who are hiring”, “Embeddings engineers in San Francisco”, or “founders shipping Embeddings” and Refolk returns a ranked shortlist with sources.
How this list is built
Last refreshed: Thu, 07 May 2026 05:55:22 GMT
Need a list like this for any search?
Refolk runs natural-language searches across GitHub, LinkedIn, and the open web. Try one of these:
Browse other topics
- Top AI agents repos
- Top RAG repos
- Top Deep learning repos
- Top Fine-tuning repos
- Top Natural language processing repos
- Top Observability repos
- Top Text-to-speech repos
- Top Security repos
See all repository lists.