Refolk

Top Data science repositories on GitHub

Notebooks, analysis libraries, and data tooling.

Ranked by stars across 2,390 repositories tagged data-science. Refreshed daily.

  1. 1
    microsoft/ML-For-Beginners85,676 · ⑂ 20,743

    12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all

    • ml
    • data-science
    • machine-learning
    • machine-learning-algorithms
    • machinelearning
    • python
  2. 2
    apache/superset72,716 · ⑂ 17,207

    Apache Superset is a Data Visualization and Data Exploration Platform

    • superset
    • apache
    • apache-superset
    • data-visualization
    • data-viz
    • analytics
  3. 3
    scikit-learn/scikit-learn65,988 · ⑂ 26,991

    scikit-learn: machine learning in Python

    • machine-learning
    • python
    • statistics
    • data-science
    • data-analysis
  4. 4
    keras-team/keras64,060 · ⑂ 19,765

    Deep Learning for humans

    • deep-learning
    • tensorflow
    • neural-networks
    • machine-learning
    • data-science
    • python
  5. 5
    Asabeneh/30-Days-Of-Python62,146 · ⑂ 11,817

    The 30 Days of Python programming challenge is a step-by-step guide to learn the Python programming language in 30 days. This challenge may take more than 100 days. Follow your own pace. These videos may help too: https://www.youtube.com/channel/UC7PNRuno1rzYPb1xLa4yktw

    • 30-days-of-python
    • python
    • flask
    • github
    • heroku
    • matplotlib
  6. 6
    pandas-dev/pandas48,681 · ⑂ 19,912

    Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

    • data-analysis
    • pandas
    • flexible
    • alignment
    • python
    • data-science
  7. 7
    GokuMohandas/Made-With-ML47,510 · ⑂ 7,480

    Learn how to develop, deploy and iterate on production-grade ML applications.

    • machine-learning
    • deep-learning
    • pytorch
    • natural-language-processing
    • data-science
    • python
  8. 8
    apache/airflow45,307 · ⑂ 17,010

    Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

    • airflow
    • apache
    • apache-airflow
    • python
    • scheduler
    • workflow
  9. 9
    streamlit/streamlit44,475 · ⑂ 4,229

    Streamlit — A faster way to build and share data apps.

    • python
    • machine-learning
    • data-science
    • deep-learning
    • data-visualization
    • streamlit
  10. 10
    gradio-app/gradio42,519 · ⑂ 3,434

    Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!

    • machine-learning
    • models
    • ui
    • ui-components
    • interface
    • python
  11. 11
    ray-project/ray42,442 · ⑂ 7,531

    Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

    • ray
    • distributed
    • parallel
    • machine-learning
    • reinforcement-learning
    • deep-learning
  12. 12

    10 Weeks, 20 Lessons, Data Science for All!

    • data-science
    • python
    • data-visualization
    • data-analysis
    • pandas
    • microsoft-for-beginners
  13. 13
    explosion/spaCy33,546 · ⑂ 4,679

    💫 Industrial-strength Natural Language Processing (NLP) in Python

    • natural-language-processing
    • data-science
    • machine-learning
    • python
    • cython
    • nlp
  14. 14

    500 AI Machine learning Deep learning Computer vision NLP Projects with code

    • awesome
    • machine-learning
    • deep-learning
    • machine-learning-projects
    • deep-learning-project
    • computer-vision-project
  15. 15
    eriklindernoren/ML-From-Scratch31,421 · ⑂ 5,265

    Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.

    • machine-learning
    • deep-learning
    • deep-reinforcement-learning
    • machine-learning-from-scratch
    • data-science
    • data-mining
  16. 16
    Lightning-AI/pytorch-lightning31,116 · ⑂ 3,720

    Pretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes.

    • python
    • deep-learning
    • artificial-intelligence
    • ai
    • pytorch
    • data-science
  17. 17
    AMAI-GmbH/AI-Expert-Roadmap30,958 · ⑂ 2,581

    Roadmap to becoming an Artificial Intelligence Expert in 2022

    • deep-learning
    • artificial-intelligence
    • roadmap
    • ai-roadmap
    • machine-learning
    • study-plan
  18. 18
    academic/awesome-datascience29,135 · ⑂ 6,498

    :memo: An awesome Data Science repository to learn and apply for real world problems.

    • data-science
    • machine-learning
    • data-visualization
    • science
    • data-mining
    • awesome-list
  19. 19

    Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

    • python
    • machine-learning
    • deep-learning
    • data-science
    • big-data
    • aws
  20. 20
    eugeneyan/applied-ml28,799 · ⑂ 3,841

    📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.

    • applied-machine-learning
    • production
    • applied-data-science
    • machine-learning
    • data-science
    • reinforcement-learning
  21. 21
    d2l-ai/d2l-en28,777 · ⑂ 5,055

    Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.

    • deep-learning
    • machine-learning
    • book
    • notebook
    • computer-vision
    • natural-language-processing
  22. 22
    reflex-dev/reflex28,385 · ⑂ 1,714

    🕸️ Web apps in pure Python 🐍

    • python
    • framework
    • open-source
    • gui
    • dashboard
    • fullstack
  23. 23

    aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)

    • bayesian-methods
    • pymc
    • mathematical-analysis
    • jupyter-notebook
    • data-science
    • statistics
  24. 24
    fastai/fastbook24,934 · ⑂ 9,458

    The fastai book, published as Jupyter Notebooks

    • notebooks
    • fastai
    • deep-learning
    • machine-learning
    • data-science
    • python
  25. 25
    plotly/dash24,151 · ⑂ 2,277

    Data Apps & Dashboards for Python. No JavaScript Required.

    • dash
    • plotly
    • data-visualization
    • data-science
    • gui-framework
    • flask

Find engineers shipping Data science

The list above ranks the most-starred public repositories tagged with the Data science topic, drawn from the public GitHub graph. Across 2,390 repositories tagged this way, the maintainers and top contributors are a tight cluster of the people actually building Data science.

Looking for engineers who’ve worked on Data science for real, not just listed it on LinkedIn? The fastest path is the contributor list of these repos. Their commits, issues, and READMEs are public proof of depth.

Refolk turns this list into a search. Ask for “maintainers of top Data science repos who are hiring”, Data science engineers in San Francisco”, or “founders shipping Data science” and Refolk returns a ranked shortlist with sources.

How this list is built

Refolk searched GitHub for public repositories tagged with the Data science topic, ranked them by stargazer count, and kept those with at least 50 stars. The list refreshes once a day.

Last refreshed: Thu, 07 May 2026 05:55:10 GMT

Need a list like this for any search?

Refolk runs natural-language searches across GitHub, LinkedIn, and the open web. Try one of these:

Browse other topics

See all repository lists.

Data science by language