Refolk

Top Python Embeddings repositories on GitHub

Models, libraries, and infrastructure for vector representations of text and media. Filtered to projects whose primary language is Python.

Ranked by stars across 297 Python repositories tagged embeddings. Refreshed daily.

  1. 1
    neuml/txtai12,471 · ⑂ 808

    💡 All-in-one AI framework for semantic search, LLM orchestration and language model workflows

    • python
    • search
    • nlp
    • semantic-search
    • vector-search
    • txtai
  2. 2
    Embedding/Chinese-Word-Vectors12,215 · ⑂ 2,327

    100+ Chinese Word Vectors 上百种预训练中文词向量

    • chinese
    • chinese-word-segmentation
    • embeddings
    • word-embeddings
    • vectors-trained
    • embedding
  3. 3
    h2oai/h2ogpt11,988 · ⑂ 1,313

    Private chat with local GPT with document, images, video, etc. 100% private, Apache 2.0. Supports oLLaMa, Mixtral, llama.cpp, and more. Demo: https://gpt.h2o.ai/ https://gpt-docs.h2o.ai/

    • chatgpt
    • llm
    • ai
    • embeddings
    • generative
    • gpt
  4. 4
    FlagOpen/FlagEmbedding11,647 · ⑂ 870

    Retrieval and Retrieval-augmented LLMs

    • embeddings
    • information-retrieval
    • llm
    • sentence-embeddings
    • text-semantic-similarity
    • retrieval-augmented-generation
  5. 5

    The easiest way to use deep metric learning in your application. Modular, flexible, and extensible. Written in PyTorch.

    • metric-learning
    • deep-learning
    • computer-vision
    • machine-learning
    • pytorch
    • deep-metric-learning
  6. 6
    shibing624/text2vec4,962 · ⑂ 427

    text2vec, text to vector. 文本向量表征工具,把文本转化为向量矩阵,实现了Word2Vec、RankBM25、Sentence-BERT、CoSENT等文本表征、文本相似度计算模型,开箱即用。

    • similarity
    • nlp
    • text-similarity
    • text2vec
    • word2vec
    • embeddings
  7. 7
    Marker-Inc-Korea/AutoRAG4,749 · ⑂ 397

    AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation

    • analysis
    • automl
    • benchmarking
    • document-parser
    • embeddings
    • evaluation
  8. 8
    lightly-ai/lightly3,734 · ⑂ 326

    A python library for self-supervised learning on images.

    • deep-learning
    • self-supervised-learning
    • machine-learning
    • computer-vision
    • pytorch
    • embeddings
  9. 9
    tensorflow/hub3,522 · ⑂ 1,646

    A library for transfer learning by reusing parts of TensorFlow models.

    • tensorflow
    • machine-learning
    • transfer-learning
    • embeddings
    • image-classification
    • python
  10. 10
    towhee-io/towhee3,446 · ⑂ 260

    Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.

    • machine-learning
    • convolutional-networks
    • embedding-vectors
    • embeddings
    • computer-vision
    • image-processing
  11. 11
    plastic-labs/honcho3,304 · ⑂ 389

    Memory library for building stateful agents

    • ai
    • llm
    • memory
    • personalization
    • embeddings
    • rag
  12. 12
    hegelai/prompttools3,039 · ⑂ 256

    Open-source tools for prompt testing and experimentation, with support for both LLMs (e.g. OpenAI, LLaMA) and vector databases (e.g. Chroma, Weaviate, LanceDB).

    • deep-learning
    • large-language-models
    • machine-learning
    • prompt-engineering
    • python
    • embeddings
  13. 13
    qdrant/fastembed2,927 · ⑂ 195

    Fast, Accurate, Lightweight Python library to make State of the Art Embedding

    • embeddings
    • openai
    • rag
    • retrieval
    • retrieval-augmented-generation
    • vector-search
  14. 14
    ailia-ai/ailia-models2,345 · ⑂ 357

    The collection of pre-trained, state-of-the-art AI models for ailia SDK

    • deep-learning
    • face-recognition
    • face-detection
    • object-detection
    • object-recognition
    • hand-detection
  15. 15
    PetrochukM/PyTorch-NLP2,229 · ⑂ 255

    Basic Utilities for PyTorch Natural Language Processing (NLP)

    • pytorch
    • nlp
    • natural-language-processing
    • pytorch-nlp
    • torchnlp
    • data-loader
  16. 16
    MinishLab/model2vec2,055 · ⑂ 121

    Fast State-of-the-Art Static Embeddings

    • embeddings
    • machine-learning
    • model2vec
    • nlp
    • python
    • sentence-transformers
  17. 17
    xlang-ai/instructor-embedding2,022 · ⑂ 157

    [ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings

    • embeddings
    • information-retrieval
    • language-model
    • text-classification
    • text-clustering
    • text-embedding
  18. 18
    lilianweng/stock-rnn1,976 · ⑂ 675

    Predict stock market prices using RNN model with multilayer LSTM cells + optional multi-stock embeddings.

    • lstm
    • rnn-tensorflow
    • stock-price-prediction
    • embeddings
  19. 19
    nomic-ai/nomic1,876 · ⑂ 197

    Nomic Developer API SDK

    • python
    • clustering
    • duplicate-detection
    • embeddings
    • text
    • topic-modeling
  20. 20
    Kav-K/GPTDiscord1,854 · ⑂ 294

    A robust, all-in-one GPT interface for Discord. ChatGPT-style conversations, image generation, AI-moderation, custom indexes/knowledgebase, youtube summarizer, and more!

    • artificial-intelligence
    • asyncio
    • gpt3
    • help-wanted
    • openai
    • openai-api
  21. 21

    中文长文本分类、短句子分类、多标签分类、两句子相似度(Chinese Text Classification of Keras NLP, multi-label classify, or sentence classify, long or short),字词句向量嵌入层(embeddings)和网络层(graph)构建基类,FastText,TextCNN,CharCNN,TextRNN, RCNN, DCNN, DPCNN, VDCNN, CRNN, Bert, Xlnet, Albert, Attention, DeepMoji, HAN, 胶囊网络-CapsuleNet, Transformer-encode, Seq2seq, SWEM, LEAM, TextGCN

    • text-classification
    • keras
    • rcnn
    • dcnn
    • charcnn
    • bert
  22. 22
    superlinked/sie1,705 · ⑂ 134

    Superlinked Inference Engine is an Open-source inference server and production cluster for embeddings, reranking, and extraction.

    • embeddings
    • etl
    • vector-search
    • data-pipeline
    • deep-learning
    • information-retrieval
  23. 23
    AnswerDotAI/ModernBERT1,669 · ⑂ 146

    Bringing BERT into modernity via both architecture changes and scaling

    • bert
    • embeddings
    • llm
    • nlp
  24. 24
    plasticityai/magnitude1,658 · ⑂ 122

    A fast, efficient universal vector embedding utility package.

    • python
    • natural-language-processing
    • nlp
    • machine-learning
    • vectors
    • embeddings
  25. 25
    jasonwei20/eda_nlp1,652 · ⑂ 313

    Data augmentation for NLP, presented at EMNLP 2019

    • nlp
    • data-augmentation
    • text-classification
    • synonyms
    • embeddings
    • sentence

Find Python engineers shipping Embeddings

The list above ranks the most-starred public Python repositories tagged with the Embeddings topic, drawn from the public GitHub graph. Across 297 matching repositories, the contributors are a tight cluster of engineers with both Python chops and real Embeddings experience.

That overlap is rare. Most Python engineers haven’t shipped Embeddings, and most Embeddings maintainers don’t write Python. The people on this list’s contributor graph are the ones who do both.

Refolk turns this list into a search. Ask for Python Embeddings maintainers hiring” or Python engineers shipping Embeddings in 2025” and Refolk returns a ranked shortlist with the commits, profiles, and projects behind each name.

How this list is built

Refolk searched GitHub for public Python repositories tagged with the Embeddings topic, ranked them by stargazer count, and kept those with at least 25 stars. The list refreshes once a day.

Last refreshed: Thu, 07 May 2026 05:54:12 GMT

Need a more specific search?

Refolk runs natural-language searches across GitHub, LinkedIn, and the open web. Try one of these:

Related lists

See all repository lists.

Or zoom out