Top Python Embeddings repositories on GitHub
Models, libraries, and infrastructure for vector representations of text and media. Filtered to projects whose primary language is Python.
Ranked by stars across 306 Python repositories tagged embeddings. Refreshed daily.
- 1neuml/txtai★ 12,673 · ⑂ 835
💡 All-in-one AI framework for semantic search, LLM orchestration and language model workflows
- python
- search
- nlp
- semantic-search
- vector-search
- txtai
- 2Embedding/Chinese-Word-Vectors★ 12,228 · ⑂ 2,325
100+ Chinese Word Vectors 上百种预训练中文词向量
- chinese
- chinese-word-segmentation
- embeddings
- word-embeddings
- vectors-trained
- embedding
- 3RyanCodrai/turbovec★ 12,022 · ⑂ 1,060
A vector index built on TurboQuant, written in Rust with Python bindings
- ann
- avx512
- embeddings
- faiss
- nearest-neighbor
- neon
- 4h2oai/h2ogpt★ 11,982 · ⑂ 1,307
Private chat with local GPT with document, images, video, etc. 100% private, Apache 2.0. Supports oLLaMa, Mixtral, llama.cpp, and more. Demo: https://gpt.h2o.ai/ https://gpt-docs.h2o.ai/
- chatgpt
- llm
- ai
- embeddings
- generative
- gpt
- 5FlagOpen/FlagEmbedding★ 11,845 · ⑂ 890
Retrieval and Retrieval-augmented LLMs
- embeddings
- information-retrieval
- llm
- sentence-embeddings
- text-semantic-similarity
- retrieval-augmented-generation
- 6KevinMusgrave/pytorch-metric-learning★ 6,327 · ⑂ 659
The easiest way to use deep metric learning in your application. Modular, flexible, and extensible. Written in PyTorch.
- metric-learning
- deep-learning
- computer-vision
- machine-learning
- pytorch
- deep-metric-learning
- 7MinishLab/semble★ 5,331 · ⑂ 229
Fast and Accurate Code Search for Agents. Uses ~98% fewer tokens than grep+read
- agents
- code-search
- embeddings
- mcp
- mcp-server
- model-context-protocol
- 8plastic-labs/honcho★ 5,324 · ⑂ 646
Memory library for building stateful agents
- ai
- llm
- memory
- personalization
- embeddings
- rag
- 9shibing624/text2vec★ 4,970 · ⑂ 428
text2vec, text to vector. 文本向量表征工具,把文本转化为向量矩阵,实现了Word2Vec、RankBM25、Sentence-BERT、CoSENT等文本表征、文本相似度计算模型,开箱即用。
- similarity
- nlp
- text-similarity
- text2vec
- word2vec
- embeddings
- 10Marker-Inc-Korea/AutoRAG★ 4,835 · ⑂ 402
AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation
- analysis
- automl
- benchmarking
- document-parser
- embeddings
- evaluation
- 11lightly-ai/lightly★ 3,766 · ⑂ 328
A python library for self-supervised learning on images.
- deep-learning
- self-supervised-learning
- machine-learning
- computer-vision
- pytorch
- embeddings
- 12tensorflow/hub★ 3,524 · ⑂ 1,640
A library for transfer learning by reusing parts of TensorFlow models.
- tensorflow
- machine-learning
- transfer-learning
- embeddings
- image-classification
- python
- 13towhee-io/towhee★ 3,448 · ⑂ 261
Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
- machine-learning
- convolutional-networks
- embedding-vectors
- embeddings
- computer-vision
- image-processing
- 14qdrant/fastembed★ 3,044 · ⑂ 208
Fast, Accurate, Lightweight Python library to make State of the Art Embedding
- embeddings
- openai
- rag
- retrieval
- retrieval-augmented-generation
- vector-search
- 15hegelai/prompttools★ 3,040 · ⑂ 256
Open-source tools for prompt testing and experimentation, with support for both LLMs (e.g. OpenAI, LLaMA) and vector databases (e.g. Chroma, Weaviate, LanceDB).
- deep-learning
- large-language-models
- machine-learning
- prompt-engineering
- python
- embeddings
- 16ailia-ai/ailia-models★ 2,348 · ⑂ 361
The collection of pre-trained, state-of-the-art AI models for ailia SDK
- deep-learning
- face-recognition
- face-detection
- object-detection
- object-recognition
- hand-detection
- 17PetrochukM/PyTorch-NLP★ 2,226 · ⑂ 254
Basic Utilities for PyTorch Natural Language Processing (NLP)
- pytorch
- nlp
- natural-language-processing
- pytorch-nlp
- torchnlp
- data-loader
- 18MinishLab/model2vec★ 2,130 · ⑂ 122
Fast State-of-the-Art Static Embeddings
- embeddings
- machine-learning
- model2vec
- nlp
- python
- sentence-transformers
- 19zilliztech/memsearch★ 2,090 · ⑂ 188
A persistent, unified memory layer for all your AI agents (e.g. Claude Code, Codex), backed by Markdown and Milvus.
- agent-memory
- claude-code
- claude-code-plugin
- memory
- openclaw
- progressive-disclosure
- 20superlinked/sie★ 2,061 · ⑂ 183
Open-source inference server and production cluster for all the models your agent needs.
- embeddings
- vector-search
- data-pipeline
- deep-learning
- information-retrieval
- llm
- 21xlang-ai/instructor-embedding★ 2,024 · ⑂ 157
[ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings
- embeddings
- information-retrieval
- language-model
- text-classification
- text-clustering
- text-embedding
- 22lilianweng/stock-rnn★ 1,975 · ⑂ 672
Predict stock market prices using RNN model with multilayer LSTM cells + optional multi-stock embeddings.
- lstm
- rnn-tensorflow
- stock-price-prediction
- embeddings
- 23nomic-ai/nomic★ 1,878 · ⑂ 197
Nomic Developer API SDK
- python
- clustering
- duplicate-detection
- embeddings
- text
- topic-modeling
- 24Kav-K/GPTDiscord★ 1,855 · ⑂ 292
A robust, all-in-one GPT interface for Discord. ChatGPT-style conversations, image generation, AI-moderation, custom indexes/knowledgebase, youtube summarizer, and more!
- artificial-intelligence
- asyncio
- gpt3
- help-wanted
- openai
- openai-api
- 25yongzhuo/Keras-TextClassification★ 1,815 · ⑂ 398
中文长文本分类、短句子分类、多标签分类、两句子相似度(Chinese Text Classification of Keras NLP, multi-label classify, or sentence classify, long or short),字词句向量嵌入层(embeddings)和网络层(graph)构建基类,FastText,TextCNN,CharCNN,TextRNN, RCNN, DCNN, DPCNN, VDCNN, CRNN, Bert, Xlnet, Albert, Attention, DeepMoji, HAN, 胶囊网络-CapsuleNet, Transformer-encode, Seq2seq, SWEM, LEAM, TextGCN
- text-classification
- keras
- rcnn
- dcnn
- charcnn
- bert
Find Python engineers shipping Embeddings
The list above ranks the most-starred public Python repositories tagged with the Embeddings topic, drawn from the public GitHub graph. Across 306 matching repositories, the contributors are a tight cluster of engineers with both Python chops and real Embeddings experience.
That overlap is rare. Most Python engineers haven’t shipped Embeddings, and most Embeddings maintainers don’t write Python. The people on this list’s contributor graph are the ones who do both.
Refolk turns this list into a search. Ask for “Python Embeddings maintainers hiring” or “Python engineers shipping Embeddings in 2025” and Refolk returns a ranked shortlist with the commits, profiles, and projects behind each name.
How this list is built
Last refreshed: Sun, 21 Jun 2026 07:07:35 GMT
Need a more specific search?
Refolk runs natural-language searches across GitHub, LinkedIn, and the open web. Try one of these:
Related lists
- Python · Machine learning
- Python · Deep learning
- Python · Computer vision
- Python · Natural language processing
- Python · LLM
- Python · AI agents
- Python · RAG
- Python · Transformers
See all repository lists.