Refolk

Top Python Data science repositories on GitHub

Notebooks, analysis libraries, and data tooling. Filtered to projects whose primary language is Python.

Ranked by stars across 1,150 Python repositories tagged data-science. Refreshed daily.

  1. 1
    scikit-learn/scikit-learn65,990 · ⑂ 26,990

    scikit-learn: machine learning in Python

    • machine-learning
    • python
    • statistics
    • data-science
    • data-analysis
  2. 2
    keras-team/keras64,061 · ⑂ 19,765

    Deep Learning for humans

    • deep-learning
    • tensorflow
    • neural-networks
    • machine-learning
    • data-science
    • python
  3. 3
    Asabeneh/30-Days-Of-Python62,150 · ⑂ 11,817

    The 30 Days of Python programming challenge is a step-by-step guide to learn the Python programming language in 30 days. This challenge may take more than 100 days. Follow your own pace. These videos may help too: https://www.youtube.com/channel/UC7PNRuno1rzYPb1xLa4yktw

    • 30-days-of-python
    • python
    • flask
    • github
    • heroku
    • matplotlib
  4. 4
    pandas-dev/pandas48,682 · ⑂ 19,912

    Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

    • data-analysis
    • pandas
    • flexible
    • alignment
    • python
    • data-science
  5. 5
    apache/airflow45,307 · ⑂ 17,010

    Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

    • airflow
    • apache
    • apache-airflow
    • python
    • scheduler
    • workflow
  6. 6
    streamlit/streamlit44,474 · ⑂ 4,229

    Streamlit — A faster way to build and share data apps.

    • python
    • machine-learning
    • data-science
    • deep-learning
    • data-visualization
    • streamlit
  7. 7
    gradio-app/gradio42,519 · ⑂ 3,434

    Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!

    • machine-learning
    • models
    • ui
    • ui-components
    • interface
    • python
  8. 8
    ray-project/ray42,442 · ⑂ 7,532

    Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

    • ray
    • distributed
    • parallel
    • machine-learning
    • reinforcement-learning
    • deep-learning
  9. 9
    explosion/spaCy33,547 · ⑂ 4,679

    💫 Industrial-strength Natural Language Processing (NLP) in Python

    • natural-language-processing
    • data-science
    • machine-learning
    • python
    • cython
    • nlp
  10. 10
    eriklindernoren/ML-From-Scratch31,421 · ⑂ 5,265

    Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.

    • machine-learning
    • deep-learning
    • deep-reinforcement-learning
    • machine-learning-from-scratch
    • data-science
    • data-mining
  11. 11
    Lightning-AI/pytorch-lightning31,116 · ⑂ 3,720

    Pretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes.

    • python
    • deep-learning
    • artificial-intelligence
    • ai
    • pytorch
    • data-science
  12. 12

    Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

    • python
    • machine-learning
    • deep-learning
    • data-science
    • big-data
    • aws
  13. 13
    d2l-ai/d2l-en28,778 · ⑂ 5,055

    Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.

    • deep-learning
    • machine-learning
    • book
    • notebook
    • computer-vision
    • natural-language-processing
  14. 14
    reflex-dev/reflex28,387 · ⑂ 1,714

    🕸️ Web apps in pure Python 🐍

    • python
    • framework
    • open-source
    • gui
    • dashboard
    • fullstack
  15. 15
    plotly/dash24,151 · ⑂ 2,277

    Data Apps & Dashboards for Python. No JavaScript Required.

    • dash
    • plotly
    • data-visualization
    • data-science
    • gui-framework
    • flask
  16. 16
    sinaptik-ai/pandas-ai23,510 · ⑂ 2,309

    Chat with your database or your datalake (SQL, CSV, parquet). PandasAI makes data analysis conversational using LLMs and RAG.

    • llm
    • pandas
    • ai
    • data-analysis
    • data-science
    • gpt-4
  17. 17
    matplotlib/matplotlib22,771 · ⑂ 8,326

    matplotlib: plotting with Python

    • matplotlib
    • data-visualization
    • data-science
    • python
    • qt
    • wx
  18. 18
    PrefectHQ/prefect22,319 · ⑂ 2,294

    Prefect is a workflow orchestration framework for building resilient data pipelines in Python.

    • python
    • workflow
    • data-engineering
    • data-science
    • workflow-engine
    • prefect
  19. 19
    recommenders-team/recommenders21,670 · ⑂ 3,319

    Best Practices on Recommendation Systems

    • machine-learning
    • recommender
    • ranking
    • deep-learning
    • python
    • jupyter-notebook
  20. 20
    marimo-team/marimo20,824 · ⑂ 1,070

    A reactive notebook for Python — run reproducible experiments, query with SQL, execute as a script, deploy as an app, and version with git. Stored as pure Python. All in a modern, AI-native editor.

    • notebooks
    • python
    • data-science
    • machine-learning
    • artificial-intelligence
    • data-visualization
  21. 21
    akfamily/akshare18,928 · ⑂ 3,137

    AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库

    • futures
    • financial-data
    • data-science
    • quant
    • fundamental
    • akshare
  22. 22
    ipython/ipython16,698 · ⑂ 4,473

    Official repository for IPython itself. Other repos in the IPython organization contain things like the website, documentation builds, etc.

    • ipython
    • jupyter
    • data-science
    • notebook
    • python
    • repl
  23. 23
    piskvorky/gensim16,408 · ⑂ 4,412

    Topic Modelling for Humans

    • gensim
    • topic-modeling
    • information-retrieval
    • machine-learning
    • natural-language-processing
    • nlp
  24. 24
    treeverse/dvc15,582 · ⑂ 1,293

    🦉 Data Versioning and ML Experiments

    • data-science
    • machine-learning
    • reproducibility
    • data-version-control
    • developer-tools
    • ai
  25. 25
    dagster-io/dagster15,441 · ⑂ 2,113

    An orchestration platform for the development, production, and observation of data assets.

    • data-pipelines
    • dagster
    • workflow
    • data-science
    • workflow-automation
    • python

Find Python engineers shipping Data science

The list above ranks the most-starred public Python repositories tagged with the Data science topic, drawn from the public GitHub graph. Across 1,150 matching repositories, the contributors are a tight cluster of engineers with both Python chops and real Data science experience.

That overlap is rare. Most Python engineers haven’t shipped Data science, and most Data science maintainers don’t write Python. The people on this list’s contributor graph are the ones who do both.

Refolk turns this list into a search. Ask for Python Data science maintainers hiring” or Python engineers shipping Data science in 2025” and Refolk returns a ranked shortlist with the commits, profiles, and projects behind each name.

How this list is built

Refolk searched GitHub for public Python repositories tagged with the Data science topic, ranked them by stargazer count, and kept those with at least 25 stars. The list refreshes once a day.

Last refreshed: Thu, 07 May 2026 06:52:14 GMT

Need a more specific search?

Refolk runs natural-language searches across GitHub, LinkedIn, and the open web. Try one of these:

Related lists

See all repository lists.

Or zoom out