Refolk

Top Embeddings repositories on GitHub

Models, libraries, and infrastructure for vector representations of text and media.

Ranked by stars across 442 repositories tagged embeddings. Refreshed daily.

  1. 1
    supabase/supabase101,968 · ⑂ 12,318

    The Postgres development platform. Supabase gives you a dedicated Postgres database to build your web, mobile, and AI applications.

    • firebase
    • supabase
    • realtime
    • postgrest
    • postgres
    • postgresql
  2. 2
    thedotmack/claude-mem73,048 · ⑂ 6,271

    A Claude Code plugin that automatically captures everything Claude does during your coding sessions, compresses it with AI (using Claude's agent-sdk), and injects relevant context back into future sessions.

    • ai
    • ai-agents
    • ai-memory
    • anthropic
    • artificial-intelligence
    • claude
  3. 3
    NirDiamant/RAG_Techniques27,164 · ⑂ 3,267

    This repository showcases various advanced techniques for Retrieval-Augmented Generation (RAG) systems. Each technique has a detailed notebook tutorial.

    • rag
    • tutorials
    • langchain
    • llama-index
    • llms
    • python
  4. 4
    Tencent/WeKnora14,305 · ⑂ 1,741

    Open-source LLM knowledge platform: turn raw documents into a queryable RAG, an autonomous reasoning agent, and a self-maintaining Wiki.

    • agent
    • agentic
    • ai
    • golang
    • llm
    • ollama
  5. 5
    neuml/txtai12,471 · ⑂ 808

    💡 All-in-one AI framework for semantic search, LLM orchestration and language model workflows

    • python
    • search
    • nlp
    • semantic-search
    • vector-search
    • txtai
  6. 6
    Embedding/Chinese-Word-Vectors12,215 · ⑂ 2,327

    100+ Chinese Word Vectors 上百种预训练中文词向量

    • chinese
    • chinese-word-segmentation
    • embeddings
    • word-embeddings
    • vectors-trained
    • embedding
  7. 7
    h2oai/h2ogpt11,988 · ⑂ 1,313

    Private chat with local GPT with document, images, video, etc. 100% private, Apache 2.0. Supports oLLaMa, Mixtral, llama.cpp, and more. Demo: https://gpt.h2o.ai/ https://gpt-docs.h2o.ai/

    • chatgpt
    • llm
    • ai
    • embeddings
    • generative
    • gpt
  8. 8
    langchain4j/langchain4j11,870 · ⑂ 2,200

    LangChain4j is an idiomatic, open-source Java library for building LLM-powered applications on the JVM. It offers a unified API over popular LLM providers and vector stores, and makes implementing tool calling (including MCP support), agents and RAG easy. It integrates seamlessly with enterprise Java frameworks like Quarkus and Spring Boot.

    • huggingface
    • java
    • langchain
    • openai
    • chatgpt
    • gpt
  9. 9
    FlagOpen/FlagEmbedding11,647 · ⑂ 870

    Retrieval and Retrieval-augmented LLMs

    • embeddings
    • information-retrieval
    • llm
    • sentence-embeddings
    • text-semantic-similarity
    • retrieval-augmented-generation
  10. 10
    apache/seatunnel9,312 · ⑂ 2,231

    SeaTunnel is a multimodal, high-performance, distributed, massive data integration tool.

    • data-integration
    • high-performance
    • offline
    • real-time
    • apache
    • batch
  11. 11
    InsForge/InsForge8,566 · ⑂ 708

    InsForge is a Postgres-based backend with auth, storage, compute, hosting, and AI gateway. Built for coding agents.

    • ai
    • ai-agents
    • coding
    • oauth2
    • postgresql
    • deno
  12. 12
    postgresml/postgresml6,782 · ⑂ 361

    Postgres with GPUs for ML/AI apps.

    • ml
    • machine-learning
    • ai
    • ann
    • artificial-intelligence
    • classification
  13. 13
    lance-format/lance6,390 · ⑂ 655

    Open Lakehouse Format for Multimodal AI. Convert from Parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..

    • machine-learning
    • computer-vision
    • data-format
    • deep-learning
    • python
    • apache-arrow
  14. 14

    The easiest way to use deep metric learning in your application. Modular, flexible, and extensible. Written in PyTorch.

    • metric-learning
    • deep-learning
    • computer-vision
    • machine-learning
    • pytorch
    • deep-metric-learning
  15. 15
    Eventual-Inc/Daft5,454 · ⑂ 462

    High-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale

    • machine-learning
    • python
    • data-engineering
    • distributed-computing
    • rust
    • big-data
  16. 16
    shibing624/text2vec4,962 · ⑂ 427

    text2vec, text to vector. 文本向量表征工具,把文本转化为向量矩阵,实现了Word2Vec、RankBM25、Sentence-BERT、CoSENT等文本表征、文本相似度计算模型,开箱即用。

    • similarity
    • nlp
    • text-similarity
    • text2vec
    • word2vec
    • embeddings
  17. 17

    Chat with your notes & see links to related content with AI embeddings. Use local models or 100+ via APIs like Claude, Gemini, ChatGPT & Llama 3

    • chatgpt
    • embeddings
    • claude
    • gemini
    • llama3
    • obsidian
  18. 18

    A blazing fast inference solution for text embeddings models

    • ai
    • embeddings
    • huggingface
    • llm
    • ml
  19. 19
    Marker-Inc-Korea/AutoRAG4,749 · ⑂ 397

    AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation

    • analysis
    • automl
    • benchmarking
    • document-parser
    • embeddings
    • evaluation
  20. 20
    CaviraOSS/OpenMemory4,067 · ⑂ 465

    Local persistent memory store for LLM applications including claude desktop, github copilot, codex, antigravity, etc.

    • ai
    • ai-agents
    • ai-infrastructure
    • ai-memory
    • artificial-intelligence
    • cognitive-architecture
  21. 21
    crmne/ruby_llm3,901 · ⑂ 438

    One beautiful Ruby API for OpenAI, Anthropic, Gemini, Bedrock, Azure, OpenRouter, DeepSeek, Ollama, VertexAI, Perplexity, Mistral, xAI, GPUStack & OpenAI compatible APIs. Agents, Chat, Vision, Audio, PDF, Images, Embeddings, Tools, Streaming & Rails integration.

    • llm
    • ruby
    • ai
    • anthropic
    • chatgpt
    • claude
  22. 22
    lightly-ai/lightly3,734 · ⑂ 326

    A python library for self-supervised learning on images.

    • deep-learning
    • self-supervised-learning
    • machine-learning
    • computer-vision
    • pytorch
    • embeddings
  23. 23
    tensorflow/hub3,522 · ⑂ 1,646

    A library for transfer learning by reusing parts of TensorFlow models.

    • tensorflow
    • machine-learning
    • transfer-learning
    • embeddings
    • image-classification
    • python
  24. 24
    towhee-io/towhee3,446 · ⑂ 260

    Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.

    • machine-learning
    • convolutional-networks
    • embedding-vectors
    • embeddings
    • computer-vision
    • image-processing
  25. 25

    A curated list of Generative AI tools, works, models, and references

    • awesome-list
    • awesome
    • dall-e
    • dalle2
    • midjourney
    • prompt-engineering

Find engineers shipping Embeddings

The list above ranks the most-starred public repositories tagged with the Embeddings topic, drawn from the public GitHub graph. Across 442 repositories tagged this way, the maintainers and top contributors are a tight cluster of the people actually building Embeddings.

Looking for engineers who’ve worked on Embeddings for real, not just listed it on LinkedIn? The fastest path is the contributor list of these repos. Their commits, issues, and READMEs are public proof of depth.

Refolk turns this list into a search. Ask for “maintainers of top Embeddings repos who are hiring”, Embeddings engineers in San Francisco”, or “founders shipping Embeddings” and Refolk returns a ranked shortlist with sources.

How this list is built

Refolk searched GitHub for public repositories tagged with the Embeddings topic, ranked them by stargazer count, and kept those with at least 50 stars. The list refreshes once a day.

Last refreshed: Thu, 07 May 2026 05:55:22 GMT

Need a list like this for any search?

Refolk runs natural-language searches across GitHub, LinkedIn, and the open web. Try one of these:

Browse other topics

See all repository lists.

Embeddings by language