Top Data science repositories on GitHub
Notebooks, analysis libraries, and data tooling.
Ranked by stars across 2,390 repositories tagged data-science. Refreshed daily.
- 1microsoft/ML-For-Beginners★ 85,676 · ⑂ 20,743
12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all
- ml
- data-science
- machine-learning
- machine-learning-algorithms
- machinelearning
- python
- 2apache/superset★ 72,716 · ⑂ 17,207
Apache Superset is a Data Visualization and Data Exploration Platform
- superset
- apache
- apache-superset
- data-visualization
- data-viz
- analytics
- 3scikit-learn/scikit-learn★ 65,988 · ⑂ 26,991
scikit-learn: machine learning in Python
- machine-learning
- python
- statistics
- data-science
- data-analysis
- 4keras-team/keras★ 64,060 · ⑂ 19,765
Deep Learning for humans
- deep-learning
- tensorflow
- neural-networks
- machine-learning
- data-science
- python
- 5Asabeneh/30-Days-Of-Python★ 62,146 · ⑂ 11,817
The 30 Days of Python programming challenge is a step-by-step guide to learn the Python programming language in 30 days. This challenge may take more than 100 days. Follow your own pace. These videos may help too: https://www.youtube.com/channel/UC7PNRuno1rzYPb1xLa4yktw
- 30-days-of-python
- python
- flask
- github
- heroku
- matplotlib
- 6pandas-dev/pandas★ 48,681 · ⑂ 19,912
Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
- data-analysis
- pandas
- flexible
- alignment
- python
- data-science
- 7GokuMohandas/Made-With-ML★ 47,510 · ⑂ 7,480
Learn how to develop, deploy and iterate on production-grade ML applications.
- machine-learning
- deep-learning
- pytorch
- natural-language-processing
- data-science
- python
- 8apache/airflow★ 45,307 · ⑂ 17,010
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
- airflow
- apache
- apache-airflow
- python
- scheduler
- workflow
- 9streamlit/streamlit★ 44,475 · ⑂ 4,229
Streamlit — A faster way to build and share data apps.
- python
- machine-learning
- data-science
- deep-learning
- data-visualization
- streamlit
- 10gradio-app/gradio★ 42,519 · ⑂ 3,434
Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
- machine-learning
- models
- ui
- ui-components
- interface
- python
- 11ray-project/ray★ 42,442 · ⑂ 7,531
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
- ray
- distributed
- parallel
- machine-learning
- reinforcement-learning
- deep-learning
- 12microsoft/Data-Science-For-Beginners★ 35,267 · ⑂ 7,190
10 Weeks, 20 Lessons, Data Science for All!
- data-science
- python
- data-visualization
- data-analysis
- pandas
- microsoft-for-beginners
- 13explosion/spaCy★ 33,546 · ⑂ 4,679
💫 Industrial-strength Natural Language Processing (NLP) in Python
- natural-language-processing
- data-science
- machine-learning
- python
- cython
- nlp
- 14ashishpatel26/500-AI-Machine-learning-Deep-learning-Computer-vision-NLP-Projects-with-code★ 33,480 · ⑂ 7,119
500 AI Machine learning Deep learning Computer vision NLP Projects with code
- awesome
- machine-learning
- deep-learning
- machine-learning-projects
- deep-learning-project
- computer-vision-project
- 15eriklindernoren/ML-From-Scratch★ 31,421 · ⑂ 5,265
Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.
- machine-learning
- deep-learning
- deep-reinforcement-learning
- machine-learning-from-scratch
- data-science
- data-mining
- 16Lightning-AI/pytorch-lightning★ 31,116 · ⑂ 3,720
Pretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes.
- python
- deep-learning
- artificial-intelligence
- ai
- pytorch
- data-science
- 17AMAI-GmbH/AI-Expert-Roadmap★ 30,958 · ⑂ 2,581
Roadmap to becoming an Artificial Intelligence Expert in 2022
- deep-learning
- artificial-intelligence
- roadmap
- ai-roadmap
- machine-learning
- study-plan
- 18academic/awesome-datascience★ 29,135 · ⑂ 6,498
:memo: An awesome Data Science repository to learn and apply for real world problems.
- data-science
- machine-learning
- data-visualization
- science
- data-mining
- awesome-list
- 19donnemartin/data-science-ipython-notebooks★ 29,065 · ⑂ 8,036
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
- python
- machine-learning
- deep-learning
- data-science
- big-data
- aws
- 20eugeneyan/applied-ml★ 28,799 · ⑂ 3,841
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
- applied-machine-learning
- production
- applied-data-science
- machine-learning
- data-science
- reinforcement-learning
- 21d2l-ai/d2l-en★ 28,777 · ⑂ 5,055
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.
- deep-learning
- machine-learning
- book
- notebook
- computer-vision
- natural-language-processing
- 22reflex-dev/reflex★ 28,385 · ⑂ 1,714
🕸️ Web apps in pure Python 🐍
- python
- framework
- open-source
- gui
- dashboard
- fullstack
- 23
aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)
- bayesian-methods
- pymc
- mathematical-analysis
- jupyter-notebook
- data-science
- statistics
- 24fastai/fastbook★ 24,934 · ⑂ 9,458
The fastai book, published as Jupyter Notebooks
- notebooks
- fastai
- deep-learning
- machine-learning
- data-science
- python
- 25plotly/dash★ 24,151 · ⑂ 2,277
Data Apps & Dashboards for Python. No JavaScript Required.
- dash
- plotly
- data-visualization
- data-science
- gui-framework
- flask
Find engineers shipping Data science
The list above ranks the most-starred public repositories tagged with the Data science topic, drawn from the public GitHub graph. Across 2,390 repositories tagged this way, the maintainers and top contributors are a tight cluster of the people actually building Data science.
Looking for engineers who’ve worked on Data science for real, not just listed it on LinkedIn? The fastest path is the contributor list of these repos. Their commits, issues, and READMEs are public proof of depth.
Refolk turns this list into a search. Ask for “maintainers of top Data science repos who are hiring”, “Data science engineers in San Francisco”, or “founders shipping Data science” and Refolk returns a ranked shortlist with sources.
How this list is built
Last refreshed: Thu, 07 May 2026 05:55:10 GMT
Need a list like this for any search?
Refolk runs natural-language searches across GitHub, LinkedIn, and the open web. Try one of these:
Browse other topics
- Top Data engineering repos
- Top Vector databases repos
- Top Embeddings repos
- Top RAG repos
- Top PostgreSQL repos
- Top Machine learning repos
- Top Deep learning repos
- Top Speech recognition repos
See all repository lists.