Refolk

Top Python Embeddings repositories on GitHub

Models, libraries, and infrastructure for vector representations of text and media. Filtered to projects whose primary language is Python.

Ranked by stars across 306 Python repositories tagged embeddings. Refreshed daily.

  1. 1
    neuml/txtai12,673 · ⑂ 835

    💡 All-in-one AI framework for semantic search, LLM orchestration and language model workflows

    • python
    • search
    • nlp
    • semantic-search
    • vector-search
    • txtai
  2. 2
    Embedding/Chinese-Word-Vectors12,228 · ⑂ 2,325

    100+ Chinese Word Vectors 上百种预训练中文词向量

    • chinese
    • chinese-word-segmentation
    • embeddings
    • word-embeddings
    • vectors-trained
    • embedding
  3. 3
    RyanCodrai/turbovec12,022 · ⑂ 1,060

    A vector index built on TurboQuant, written in Rust with Python bindings

    • ann
    • avx512
    • embeddings
    • faiss
    • nearest-neighbor
    • neon
  4. 4
    h2oai/h2ogpt11,982 · ⑂ 1,307

    Private chat with local GPT with document, images, video, etc. 100% private, Apache 2.0. Supports oLLaMa, Mixtral, llama.cpp, and more. Demo: https://gpt.h2o.ai/ https://gpt-docs.h2o.ai/

    • chatgpt
    • llm
    • ai
    • embeddings
    • generative
    • gpt
  5. 5
    FlagOpen/FlagEmbedding11,845 · ⑂ 890

    Retrieval and Retrieval-augmented LLMs

    • embeddings
    • information-retrieval
    • llm
    • sentence-embeddings
    • text-semantic-similarity
    • retrieval-augmented-generation
  6. 6

    The easiest way to use deep metric learning in your application. Modular, flexible, and extensible. Written in PyTorch.

    • metric-learning
    • deep-learning
    • computer-vision
    • machine-learning
    • pytorch
    • deep-metric-learning
  7. 7
    MinishLab/semble5,331 · ⑂ 229

    Fast and Accurate Code Search for Agents. Uses ~98% fewer tokens than grep+read

    • agents
    • code-search
    • embeddings
    • mcp
    • mcp-server
    • model-context-protocol
  8. 8
    plastic-labs/honcho5,324 · ⑂ 646

    Memory library for building stateful agents

    • ai
    • llm
    • memory
    • personalization
    • embeddings
    • rag
  9. 9
    shibing624/text2vec4,970 · ⑂ 428

    text2vec, text to vector. 文本向量表征工具,把文本转化为向量矩阵,实现了Word2Vec、RankBM25、Sentence-BERT、CoSENT等文本表征、文本相似度计算模型,开箱即用。

    • similarity
    • nlp
    • text-similarity
    • text2vec
    • word2vec
    • embeddings
  10. 10
    Marker-Inc-Korea/AutoRAG4,835 · ⑂ 402

    AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation

    • analysis
    • automl
    • benchmarking
    • document-parser
    • embeddings
    • evaluation
  11. 11
    lightly-ai/lightly3,766 · ⑂ 328

    A python library for self-supervised learning on images.

    • deep-learning
    • self-supervised-learning
    • machine-learning
    • computer-vision
    • pytorch
    • embeddings
  12. 12
    tensorflow/hub3,524 · ⑂ 1,640

    A library for transfer learning by reusing parts of TensorFlow models.

    • tensorflow
    • machine-learning
    • transfer-learning
    • embeddings
    • image-classification
    • python
  13. 13
    towhee-io/towhee3,448 · ⑂ 261

    Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.

    • machine-learning
    • convolutional-networks
    • embedding-vectors
    • embeddings
    • computer-vision
    • image-processing
  14. 14
    qdrant/fastembed3,044 · ⑂ 208

    Fast, Accurate, Lightweight Python library to make State of the Art Embedding

    • embeddings
    • openai
    • rag
    • retrieval
    • retrieval-augmented-generation
    • vector-search
  15. 15
    hegelai/prompttools3,040 · ⑂ 256

    Open-source tools for prompt testing and experimentation, with support for both LLMs (e.g. OpenAI, LLaMA) and vector databases (e.g. Chroma, Weaviate, LanceDB).

    • deep-learning
    • large-language-models
    • machine-learning
    • prompt-engineering
    • python
    • embeddings
  16. 16
    ailia-ai/ailia-models2,348 · ⑂ 361

    The collection of pre-trained, state-of-the-art AI models for ailia SDK

    • deep-learning
    • face-recognition
    • face-detection
    • object-detection
    • object-recognition
    • hand-detection
  17. 17
    PetrochukM/PyTorch-NLP2,226 · ⑂ 254

    Basic Utilities for PyTorch Natural Language Processing (NLP)

    • pytorch
    • nlp
    • natural-language-processing
    • pytorch-nlp
    • torchnlp
    • data-loader
  18. 18
    MinishLab/model2vec2,130 · ⑂ 122

    Fast State-of-the-Art Static Embeddings

    • embeddings
    • machine-learning
    • model2vec
    • nlp
    • python
    • sentence-transformers
  19. 19
    zilliztech/memsearch2,090 · ⑂ 188

    A persistent, unified memory layer for all your AI agents (e.g. Claude Code, Codex), backed by Markdown and Milvus.

    • agent-memory
    • claude-code
    • claude-code-plugin
    • memory
    • openclaw
    • progressive-disclosure
  20. 20
    superlinked/sie2,061 · ⑂ 183

    Open-source inference server and production cluster for all the models your agent needs.

    • embeddings
    • vector-search
    • data-pipeline
    • deep-learning
    • information-retrieval
    • llm
  21. 21
    xlang-ai/instructor-embedding2,024 · ⑂ 157

    [ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings

    • embeddings
    • information-retrieval
    • language-model
    • text-classification
    • text-clustering
    • text-embedding
  22. 22
    lilianweng/stock-rnn1,975 · ⑂ 672

    Predict stock market prices using RNN model with multilayer LSTM cells + optional multi-stock embeddings.

    • lstm
    • rnn-tensorflow
    • stock-price-prediction
    • embeddings
  23. 23
    nomic-ai/nomic1,878 · ⑂ 197

    Nomic Developer API SDK

    • python
    • clustering
    • duplicate-detection
    • embeddings
    • text
    • topic-modeling
  24. 24
    Kav-K/GPTDiscord1,855 · ⑂ 292

    A robust, all-in-one GPT interface for Discord. ChatGPT-style conversations, image generation, AI-moderation, custom indexes/knowledgebase, youtube summarizer, and more!

    • artificial-intelligence
    • asyncio
    • gpt3
    • help-wanted
    • openai
    • openai-api
  25. 25

    中文长文本分类、短句子分类、多标签分类、两句子相似度(Chinese Text Classification of Keras NLP, multi-label classify, or sentence classify, long or short),字词句向量嵌入层(embeddings)和网络层(graph)构建基类,FastText,TextCNN,CharCNN,TextRNN, RCNN, DCNN, DPCNN, VDCNN, CRNN, Bert, Xlnet, Albert, Attention, DeepMoji, HAN, 胶囊网络-CapsuleNet, Transformer-encode, Seq2seq, SWEM, LEAM, TextGCN

    • text-classification
    • keras
    • rcnn
    • dcnn
    • charcnn
    • bert

Find Python engineers shipping Embeddings

The list above ranks the most-starred public Python repositories tagged with the Embeddings topic, drawn from the public GitHub graph. Across 306 matching repositories, the contributors are a tight cluster of engineers with both Python chops and real Embeddings experience.

That overlap is rare. Most Python engineers haven’t shipped Embeddings, and most Embeddings maintainers don’t write Python. The people on this list’s contributor graph are the ones who do both.

Refolk turns this list into a search. Ask for Python Embeddings maintainers hiring” or Python engineers shipping Embeddings in 2025” and Refolk returns a ranked shortlist with the commits, profiles, and projects behind each name.

How this list is built

Refolk searched GitHub for public Python repositories tagged with the Embeddings topic, ranked them by stargazer count, and kept those with at least 25 stars. The list refreshes once a day.

Last refreshed: Sun, 21 Jun 2026 07:07:35 GMT

Need a more specific search?

Refolk runs natural-language searches across GitHub, LinkedIn, and the open web. Try one of these:

Related lists

See all repository lists.

Or zoom out