Top Python Embeddings repositories on GitHub
Models, libraries, and infrastructure for vector representations of text and media. Filtered to projects whose primary language is Python.
Ranked by stars across 297 Python repositories tagged embeddings. Refreshed daily.
- 1neuml/txtai★ 12,471 · ⑂ 808
💡 All-in-one AI framework for semantic search, LLM orchestration and language model workflows
- python
- search
- nlp
- semantic-search
- vector-search
- txtai
- 2Embedding/Chinese-Word-Vectors★ 12,215 · ⑂ 2,327
100+ Chinese Word Vectors 上百种预训练中文词向量
- chinese
- chinese-word-segmentation
- embeddings
- word-embeddings
- vectors-trained
- embedding
- 3h2oai/h2ogpt★ 11,988 · ⑂ 1,313
Private chat with local GPT with document, images, video, etc. 100% private, Apache 2.0. Supports oLLaMa, Mixtral, llama.cpp, and more. Demo: https://gpt.h2o.ai/ https://gpt-docs.h2o.ai/
- chatgpt
- llm
- ai
- embeddings
- generative
- gpt
- 4FlagOpen/FlagEmbedding★ 11,647 · ⑂ 870
Retrieval and Retrieval-augmented LLMs
- embeddings
- information-retrieval
- llm
- sentence-embeddings
- text-semantic-similarity
- retrieval-augmented-generation
- 5KevinMusgrave/pytorch-metric-learning★ 6,321 · ⑂ 660
The easiest way to use deep metric learning in your application. Modular, flexible, and extensible. Written in PyTorch.
- metric-learning
- deep-learning
- computer-vision
- machine-learning
- pytorch
- deep-metric-learning
- 6shibing624/text2vec★ 4,962 · ⑂ 427
text2vec, text to vector. 文本向量表征工具,把文本转化为向量矩阵,实现了Word2Vec、RankBM25、Sentence-BERT、CoSENT等文本表征、文本相似度计算模型,开箱即用。
- similarity
- nlp
- text-similarity
- text2vec
- word2vec
- embeddings
- 7Marker-Inc-Korea/AutoRAG★ 4,749 · ⑂ 397
AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation
- analysis
- automl
- benchmarking
- document-parser
- embeddings
- evaluation
- 8lightly-ai/lightly★ 3,734 · ⑂ 326
A python library for self-supervised learning on images.
- deep-learning
- self-supervised-learning
- machine-learning
- computer-vision
- pytorch
- embeddings
- 9tensorflow/hub★ 3,522 · ⑂ 1,646
A library for transfer learning by reusing parts of TensorFlow models.
- tensorflow
- machine-learning
- transfer-learning
- embeddings
- image-classification
- python
- 10towhee-io/towhee★ 3,446 · ⑂ 260
Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
- machine-learning
- convolutional-networks
- embedding-vectors
- embeddings
- computer-vision
- image-processing
- 11plastic-labs/honcho★ 3,304 · ⑂ 389
Memory library for building stateful agents
- ai
- llm
- memory
- personalization
- embeddings
- rag
- 12hegelai/prompttools★ 3,039 · ⑂ 256
Open-source tools for prompt testing and experimentation, with support for both LLMs (e.g. OpenAI, LLaMA) and vector databases (e.g. Chroma, Weaviate, LanceDB).
- deep-learning
- large-language-models
- machine-learning
- prompt-engineering
- python
- embeddings
- 13qdrant/fastembed★ 2,927 · ⑂ 195
Fast, Accurate, Lightweight Python library to make State of the Art Embedding
- embeddings
- openai
- rag
- retrieval
- retrieval-augmented-generation
- vector-search
- 14ailia-ai/ailia-models★ 2,345 · ⑂ 357
The collection of pre-trained, state-of-the-art AI models for ailia SDK
- deep-learning
- face-recognition
- face-detection
- object-detection
- object-recognition
- hand-detection
- 15PetrochukM/PyTorch-NLP★ 2,229 · ⑂ 255
Basic Utilities for PyTorch Natural Language Processing (NLP)
- pytorch
- nlp
- natural-language-processing
- pytorch-nlp
- torchnlp
- data-loader
- 16MinishLab/model2vec★ 2,055 · ⑂ 121
Fast State-of-the-Art Static Embeddings
- embeddings
- machine-learning
- model2vec
- nlp
- python
- sentence-transformers
- 17xlang-ai/instructor-embedding★ 2,022 · ⑂ 157
[ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings
- embeddings
- information-retrieval
- language-model
- text-classification
- text-clustering
- text-embedding
- 18lilianweng/stock-rnn★ 1,976 · ⑂ 675
Predict stock market prices using RNN model with multilayer LSTM cells + optional multi-stock embeddings.
- lstm
- rnn-tensorflow
- stock-price-prediction
- embeddings
- 19nomic-ai/nomic★ 1,876 · ⑂ 197
Nomic Developer API SDK
- python
- clustering
- duplicate-detection
- embeddings
- text
- topic-modeling
- 20Kav-K/GPTDiscord★ 1,854 · ⑂ 294
A robust, all-in-one GPT interface for Discord. ChatGPT-style conversations, image generation, AI-moderation, custom indexes/knowledgebase, youtube summarizer, and more!
- artificial-intelligence
- asyncio
- gpt3
- help-wanted
- openai
- openai-api
- 21yongzhuo/Keras-TextClassification★ 1,811 · ⑂ 399
中文长文本分类、短句子分类、多标签分类、两句子相似度(Chinese Text Classification of Keras NLP, multi-label classify, or sentence classify, long or short),字词句向量嵌入层(embeddings)和网络层(graph)构建基类,FastText,TextCNN,CharCNN,TextRNN, RCNN, DCNN, DPCNN, VDCNN, CRNN, Bert, Xlnet, Albert, Attention, DeepMoji, HAN, 胶囊网络-CapsuleNet, Transformer-encode, Seq2seq, SWEM, LEAM, TextGCN
- text-classification
- keras
- rcnn
- dcnn
- charcnn
- bert
- 22superlinked/sie★ 1,705 · ⑂ 134
Superlinked Inference Engine is an Open-source inference server and production cluster for embeddings, reranking, and extraction.
- embeddings
- etl
- vector-search
- data-pipeline
- deep-learning
- information-retrieval
- 23AnswerDotAI/ModernBERT★ 1,669 · ⑂ 146
Bringing BERT into modernity via both architecture changes and scaling
- bert
- embeddings
- llm
- nlp
- 24plasticityai/magnitude★ 1,658 · ⑂ 122
A fast, efficient universal vector embedding utility package.
- python
- natural-language-processing
- nlp
- machine-learning
- vectors
- embeddings
- 25jasonwei20/eda_nlp★ 1,652 · ⑂ 313
Data augmentation for NLP, presented at EMNLP 2019
- nlp
- data-augmentation
- text-classification
- synonyms
- embeddings
- sentence
Find Python engineers shipping Embeddings
The list above ranks the most-starred public Python repositories tagged with the Embeddings topic, drawn from the public GitHub graph. Across 297 matching repositories, the contributors are a tight cluster of engineers with both Python chops and real Embeddings experience.
That overlap is rare. Most Python engineers haven’t shipped Embeddings, and most Embeddings maintainers don’t write Python. The people on this list’s contributor graph are the ones who do both.
Refolk turns this list into a search. Ask for “Python Embeddings maintainers hiring” or “Python engineers shipping Embeddings in 2025” and Refolk returns a ranked shortlist with the commits, profiles, and projects behind each name.
How this list is built
Last refreshed: Thu, 07 May 2026 05:54:12 GMT
Need a more specific search?
Refolk runs natural-language searches across GitHub, LinkedIn, and the open web. Try one of these:
Related lists
- Python · Machine learning
- Python · Deep learning
- Python · Computer vision
- Python · Natural language processing
- Python · LLM
- Python · AI agents
- Python · RAG
- Python · Transformers
See all repository lists.