Useful Resources for Data Science
- Python for Everybody: [Book], [Github Repository], [Online Course]
- Book: Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython by Wes McKinney, [Github Repository]
- Book: Python Data Science Handbook, [Github Repository]
- Free Cloud based Python Environment: Google Colab, you can login using your UMBC account directly, save Jupyter Notebook in your Google Drive and share them with others
- Free Cloud based Python Environment: Amazon SageMaker Studio Lab, separate account application is needed
Useful Resources for Big Data
- Apache Spark: Unified Engine for large-scale data analytics, Book: Learning Spark, 2nd Edition, [Book Github Repository]
- Dask: Scalable analytics in Python, Book: Data Science with Python and Dask, [Book Github Repository]
- Apache Flink: Stateful Computations over Data Streams
- Free Cloud based Spark Environment at COMMUNITY EDITION of DataBricks: Account application, Account login
- Free Spark Environment at UMBC: UMBC Big Data Cluster
Useful Resources for Deep Learning
- Online Book and Tutorial: Dive into Deep Learning
- Online Course: Deep Learning by Yann LeCun
- Book: Deep Learning, Ian Goodfellow and Yoshua Bengio and Aaron Courville, published by MIT press, 2016
- Book: Deep Learning with PyTorch, Eli Stevens, Luca Antiga, and Thomas Viehmann, published by Manning Publications, 2020, [Github Repository]
- Horovod: Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet
Useful Resources for Causality Discovery and Inference
- Online Tutorial: Introduction to Causal Inference, [Youtube]
- Github repository for Awesome Causality Discovery Algorithms
- Github repository for Awesome Causality Datasets
Useful Resources for Distributed Computing: High Performance Computing, Cloud Computing, etc.
- Online Tutorial on AWS Cloud: BDAL’s AWS Cloud Resource Usage Notes
- Training Materials on Cloud Computing: CloudBank
- High Performance Computing Environment at UMBC: UMBC UMBC High Performance Computing Facility (HPCF), which contains a CPU cluster, GPU cluster and a Big Data cluster