Top Data science repositories on GitHub
Notebooks, analysis libraries, and data tooling.
Ranked by stars across 2,418 repositories tagged data-science. Refreshed daily.
- 1microsoft/ML-For-Beginners★ 87,107 · ⑂ 21,148
12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all
- ml
- data-science
- machine-learning
- machine-learning-algorithms
- machinelearning
- python
- 2apache/superset★ 73,410 · ⑂ 17,672
Apache Superset is a Data Visualization and Data Exploration Platform
- superset
- apache
- apache-superset
- data-visualization
- data-viz
- analytics
- 3scikit-learn/scikit-learn★ 66,378 · ⑂ 27,086
scikit-learn: machine learning in Python
- machine-learning
- python
- statistics
- data-science
- data-analysis
- 4Asabeneh/30-Days-Of-Python★ 65,651 · ⑂ 12,289
The 30 Days of Python programming challenge is a step-by-step guide to learn the Python programming language in 30 days. This challenge may take more than 100 days. Follow your own pace. These videos may help too: https://www.youtube.com/channel/UC7PNRuno1rzYPb1xLa4yktw
- 30-days-of-python
- python
- flask
- github
- heroku
- matplotlib
- 5keras-team/keras★ 64,095 · ⑂ 19,735
Deep Learning for humans
- deep-learning
- tensorflow
- neural-networks
- machine-learning
- data-science
- python
- 6pandas-dev/pandas★ 49,034 · ⑂ 20,021
Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
- data-analysis
- pandas
- flexible
- alignment
- python
- data-science
- 7GokuMohandas/Made-With-ML★ 48,291 · ⑂ 7,594
Learn how to develop, deploy and iterate on production-grade ML applications.
- machine-learning
- deep-learning
- pytorch
- natural-language-processing
- data-science
- python
- 8apache/airflow★ 45,883 · ⑂ 17,263
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
- airflow
- apache
- apache-airflow
- python
- scheduler
- workflow
- 9streamlit/streamlit★ 45,020 · ⑂ 4,291
Streamlit — A faster way to build and share data apps.
- python
- machine-learning
- data-science
- deep-learning
- data-visualization
- streamlit
- 10SimplifyJobs/Summer2026-Internships★ 44,979 · ⑂ 3,180
Summer 2026 software engineering, data science, AI, quant, product management, and hardware internship postings. Updated daily by Simplify and Pitt CSC.
- interview-preparation
- internships
- jobs
- university
- fall-2026
- github
- 11gradio-app/gradio★ 42,970 · ⑂ 3,505
Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
- machine-learning
- models
- ui
- ui-components
- interface
- python
- 12ray-project/ray★ 42,947 · ⑂ 7,707
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
- ray
- distributed
- parallel
- machine-learning
- reinforcement-learning
- deep-learning
- 13microsoft/Data-Science-For-Beginners★ 35,766 · ⑂ 7,272
10 Weeks, 20 Lessons, Data Science for All!
- data-science
- python
- data-visualization
- data-analysis
- pandas
- microsoft-for-beginners
- 14ashishpatel26/500-AI-Machine-learning-Deep-learning-Computer-vision-NLP-Projects-with-code★ 34,723 · ⑂ 7,285
500 AI Machine learning Deep learning Computer vision NLP Projects with code
- awesome
- machine-learning
- deep-learning
- machine-learning-projects
- deep-learning-project
- computer-vision-project
- 15explosion/spaCy★ 33,674 · ⑂ 4,688
💫 Industrial-strength Natural Language Processing (NLP) in Python
- natural-language-processing
- data-science
- machine-learning
- python
- cython
- nlp
- 16eriklindernoren/ML-From-Scratch★ 31,943 · ⑂ 5,347
Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.
- machine-learning
- deep-learning
- deep-reinforcement-learning
- machine-learning-from-scratch
- data-science
- data-mining
- 17Lightning-AI/pytorch-lightning★ 31,198 · ⑂ 3,742
Pretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes.
- python
- deep-learning
- artificial-intelligence
- ai
- pytorch
- data-science
- 18AMAI-GmbH/AI-Expert-Roadmap★ 31,096 · ⑂ 2,579
Roadmap to becoming an Artificial Intelligence Expert in 2022
- deep-learning
- artificial-intelligence
- roadmap
- ai-roadmap
- machine-learning
- study-plan
- 19eugeneyan/applied-ml★ 29,809 · ⑂ 3,952
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
- applied-machine-learning
- production
- applied-data-science
- machine-learning
- data-science
- reinforcement-learning
- 20academic/awesome-datascience★ 29,440 · ⑂ 6,563
:memo: An awesome Data Science repository to learn and apply for real world problems.
- data-science
- machine-learning
- data-visualization
- science
- data-mining
- awesome-list
- 21donnemartin/data-science-ipython-notebooks★ 29,177 · ⑂ 8,025
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
- python
- machine-learning
- deep-learning
- data-science
- big-data
- aws
- 22d2l-ai/d2l-en★ 29,024 · ⑂ 5,074
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.
- deep-learning
- machine-learning
- book
- notebook
- computer-vision
- natural-language-processing
- 23reflex-dev/reflex★ 28,579 · ⑂ 1,742
🕸️ Web apps in pure Python 🐍
- python
- framework
- open-source
- gui
- dashboard
- fullstack
- 24
aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)
- bayesian-methods
- pymc
- mathematical-analysis
- jupyter-notebook
- data-science
- statistics
- 25fastai/fastbook★ 25,043 · ⑂ 9,479
The fastai book, published as Jupyter Notebooks
- notebooks
- fastai
- deep-learning
- machine-learning
- data-science
- python
Find engineers shipping Data science
The list above ranks the most-starred public repositories tagged with the Data science topic, drawn from the public GitHub graph. Across 2,418 repositories tagged this way, the maintainers and top contributors are a tight cluster of the people actually building Data science.
Looking for engineers who’ve worked on Data science for real, not just listed it on LinkedIn? The fastest path is the contributor list of these repos. Their commits, issues, and READMEs are public proof of depth.
Refolk turns this list into a search. Ask for “maintainers of top Data science repos who are hiring”, “Data science engineers in San Francisco”, or “founders shipping Data science” and Refolk returns a ranked shortlist with sources.
How this list is built
Last refreshed: Sun, 21 Jun 2026 07:09:57 GMT
Need a list like this for any search?
Refolk runs natural-language searches across GitHub, LinkedIn, and the open web. Try one of these:
Browse other topics
- Top Data engineering repos
- Top Vector databases repos
- Top Embeddings repos
- Top RAG repos
- Top PostgreSQL repos
- Top Machine learning repos
- Top Deep learning repos
- Top Speech recognition repos
See all repository lists.