NSF CAREER: Big Data Climate Causality

PI: Jianwu Wang, 2020-2025

Introduction

A fundamental problem in climate science is climate causality analysis that studies the cause-effect relationship among climate variables, such as temperature and humidity. By studying how the climate system works from a causality perspective, the findings could be used for many research areas including climate variability, climate dynamics, climate simulation, and extreme climate prediction. Nowadays, climate causality study faces many computing challenges, such as processing very large and high-dimensional datasets, and the complexity of modern computing resources. To tackle these challenges, this project targets novel causality discovery algorithms and related scalable computing techniques. The project is expected to greatly aid Earth System scientists and climate scientists to explore new hypotheses and use cases related to climate causality. The project includes an integrated program of research, education and outreach to help better understand and evaluate climate simulation, fostering workforce development for a multidisciplinary research community on “Data + Computing + Climate Science”, and raising interest in both IT technology and climate studies among K-12 students, and various underrepresented groups. The project thus serves the national interest, as stated in NSF’s mission, by promoting the progress of science and advancing national prosperity and welfare.

The goal of this CAREER project is to study efficient and reproducible causality analytics for large-scale climate data, so that climate scientists can easily test their causal hypotheses, reproduce existing studies and compare different causality analytics results. To handle the increasing dimensionality and resolution of spatiotemporal climate datasets, the project will study incremental causality discovery algorithms for large-scale climate datasets and parallel causality discovery for spatiotemporal climate data. To address the variety of both causal discovery algorithms and climate simulation/observation datasets, the project will study how to effectively measure climate causality results from different causality algorithms and different climate datasets, and integrate causality results through ensemble techniques. To cope with difficulties in conducting and reproducing causality analytics with large-scale climate datasets, the project will study cloud computing for big data climate analytics pipeline construction and execution optimization. The project will be evaluated from two perspectives. From the computing perspective, the research will be evaluated in terms of algorithm computation complexity, algorithm accuracy and algorithm scalability. From the climate perspective, the applicability of the research will be evaluated by collaborating with climate scientists in their specific research programs.

PhD Theses

  1. Xin Huang, (PhD’23), Doctoral Thesis: Deep Learning based Cloud Retrieval Techniques using Multiple Satellite Remote Sensing Data.
  2. Xin (Starly) Wang, (PhD’22), Doctoral Thesis: Secure, Reproducible And Adaptive Machine Learning In Distributed Systems.
  3. Pei Guo (PhD’21). Doctoral Thesis: Scalable Multivariate Causality Discovery From Large-scale Global Spatiotemporal Climate Data. Finalist of the BenchCouncil 2021 Distinguished Doctoral Dissertation Award.

Publications and GitHub Repositories

  1. Louis Lapp, Sahara Ali, Jianwu Wang. Integrating Fourier Transform and Residual Learning for Arctic Sea Ice Forecasting. Accepted by the REU Symposium 2023 at the 22nd IEEE International Conference on Machine Learning and Applications (ICMLA) 2023.
  2. Sahara Ali, Omar Faruque, Yiyi Huang, Md Osman Gani, Aneesh Subramanian, Nicole-Jeanne Schlegel, Jianwu Wang. Quantifying Causes of Arctic Amplification via Deep Learning based Time-series Causal Inference. Accepted by the 22nd IEEE International Conference on Machine Learning and Applications (ICMLA) 2023 [Paper Pre-Print].
  3. Omar Faruque, Francis Ndikum Nji, Mostafa Cham, Rohan Mandar Salvi, Xue Zheng, Jianwu Wang. Deep Spatiotemporal Clustering: A Temporal Clustering Approach for Multi-dimensional Climate Data. Accepted by the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD 2023). DOI:10.1007/978-3-031-43430-3_6, [Open Access Paper, Open Source Code].
  4. Sahara Ali, Jianwu Wang. MT-IceNet – A Spatial and Multi-Temporal Deep Learning Model for Arctic Sea Ice Forecasting. Accepted by 2022 IEEE/ACM 9th International Conference on Big Data Computing, Applications and Technologies (BDCAT 2022). [Paper Pre-Print, Open Source Code] (Long paper acceptance rate: 27%) Best Paper Award!
  5. Xingyan Li, Jian Li, Zachary Williams, Xin Huang, Mark Carroll, Jianwu Wang. Enhanced Deep Learning Super-Resolution for Bathymetry Data. Accepted by 2022 IEEE/ACM 9th International Conference on Big Data Computing, Applications and Technologies (BDCAT 2022). [Paper Pre-Print, Open Source Code](Long paper acceptance rate: 27%)
  6. Jorge López González, Theodore Chapman, Kathryn Chen, Hannah Nguyen, Logan Chambers, Seraj A.M. Mostafa, Jianwu Wang, Sanjay Purushotham, Chenxi Wang, Jia Yue. Atmospheric Gravity Wave Detection Using Transfer Learning Techniques. Accepted by 2022 IEEE/ACM 9th International Conference on Big Data Computing, Applications and Technologies (BDCAT 2022). [Paper Pre-Print, Open Source Code] (Long paper acceptance rate: 27%)
  7. Xin Huang, Chenxi Wang, Sanjay Purushotham, Jianwu Wang. VDAM: VAE based Domain Adaptation for Cloud Property Retrieval from Multi-satellite Data. The thirteenth International Conference on Advances in Geographic Information Systems 2022 (ACM SIGSPATIAL 2022). Article No.: 107, Pages 1–10, doi.org/10.1145/3557915.3561044 [Paper Pre-Print, Open Source Code] (Long paper acceptance rate: 23.8%)
  8. Sahara Ali, Seraj al Mahmud Mostafa, Xingyan Li, Sara Khanjani, Jianwu Wang, James Foulds, Vandana Janeja. Benchmarking Probabilistic Machine Learning Models for Arctic Sea Ice. In Proceedings of The International Geoscience and Remote Sensing Symposium (IGARSS 2022), pages: 4654-4657, DOI:10.1109/IGARSS46834.2022.9883505, IEEE. [Paper Pre-Print, Open Source Code].
  9. Xin Wang, Pei Guo, Xingyan Li, Jianwu Wang, Aryya Gangopadhyay, Carl E. Busart and Jade Freeman. Reproducible and Portable Big Data Analytics in the Cloud. arXiv preprint arXiv:2112.09762, 2021. [Open Access Paper, Open Source Code]
  10. Eliot Kim, Peter Kruse, Skylar Lama, Jamal Bourne, Michael Hu, Sahara Ali, Yiyi Huang, Jianwu Wang. Multi-Task Deep Learning Based Spatiotemporal Arctic Sea Ice Forecasting. In Proceedings of the 2021 IEEE International Conference on Big Data (BigData 2021), pages 1847-1857, IEEE. DOI:10.1109/BigData52589.2021.9671491, [Paper Pre-Print, Open Source Code].
  11. Xin Wang, Pei Guo, Jianwu Wang. Large-Scale Causality Discovery Analytics as a Service. Accepted by the Fifth IEEE International Workshop on Benchmarking, Performance Tuning and Optimization for Big Data Applications (BPOD 2021), IEEE. [Paper Pre-Print, Open Source Code]
  12. Sahara Ali, Yiyi Huang, Xin Huang, Jianwu Wang. Sea Ice Forecasting using Attention-based Ensemble LSTM. Tackling Climate Change with Machine Learning workshop at International Conference on Machine Learning (ICML), 2021. https://www.climatechange.ai/papers/icml2021/50, [Open Access Paper, Open Source Code]
  13. Pei Guo, Yiyi Huang, Jianwu Wang. Scalable and Flexible Two-Phase Ensemble Algorithms for Causality Discovery. Big Data Research, vol. 26, no. 100252, November 2021. DOI:10.1016/j.bdr.2021.100252, [Open Access Paper, Open Source Code]
  14. Yiyi Huang, Matthäus Kleindessner, Alexey Munishkin, Debvrat Varshney, Pei Guo, Jianwu Wang. Benchmarking of Data-Driven Causality Discovery Approaches in the Interactions of Arctic Sea Ice and Atmosphere. Data-driven Climate Sciences Section, Frontiers in Big Data, Frontiers, August 2021. DOI:10.3389/fdata.2021.642182, [Open Access Paper, Open Source Code]
  15. Pei Guo, Achuna Ofonedu, Jianwu Wang. Scalable and Hybrid Ensemble-Based Causality Discovery. In Proceedings of the 2020 IEEE International Conference on Smart Data Services (SMDS 2020), pages 72-80, IEEE, 2020. DOI: 10.1109/SMDS49396.2020.00016, [Paper Pre-Print, Open Source Code], Best Student Paper Award!
  16. Xin Huang, Sahara Ali, Chenxi Wang, Zeyu Ning, Sanjay Purushotham, Jianwu Wang, Zhibo Zhang. Deep Domain Adaptation based Cloud Type Detection using Active and Passive Satellite Data. In Proceedings of the 2020 IEEE International Conference on Big Data (BigData 2020), pages 1330-1337, IEEE, 2020. DOI: 10.1109/BigData50022.2020.9377756, [Paper Pre-Print, Open Source Code].
  17. Xin Huang, Sahara Ali, Sanjay Purushotham, Jianwu Wang, Chenxi Wang and Zhibo Zhang. Deep Multi-Sensor Domain Adaptation on Active and Passive Satellite Remote Sensing Data. In Proceedings of the 1st KDD Workshop on Deep Learning for Spatiotemporal Data, Applications, and Systems (DeepSpatial 2020).
  18. Ping Hou, Peng Wu, Pei Guo, Jianwu Wang, Aryya Gangopadhyay, Zhibo Zhang. A Deep Learning Model for Detecting Dust in Earth’s Atmosphere from Satellite Remote Sensing Data. 2020 IEEE International Conference on Smart Computing (SMARTCOMP), pages 196-201, 2020. DOI: 10.1109/SMARTCOMP50058.2020.00045, [Paper Pre-Print, Open Source Code]

Acknowledgement

The work is funded by the NSF CAREER award CAREER: Big Data Climate Causality Analytics.