NSF CAREER: Big Data Climate Causality

PI: Jianwu Wang, 2020-2025


A fundamental problem in climate science is climate causality analysis that studies the cause-effect relationship among climate variables, such as temperature and humidity. By studying how the climate system works from a causality perspective, the findings could be used for many research areas including climate variability, climate dynamics, climate simulation, and extreme climate prediction. Nowadays, climate causality study faces many computing challenges, such as processing very large and high-dimensional datasets, and the complexity of modern computing resources. To tackle these challenges, this project targets novel causality discovery algorithms and related scalable computing techniques. The project is expected to greatly aid Earth System scientists and climate scientists to explore new hypotheses and use cases related to climate causality. The project includes an integrated program of research, education and outreach to help better understand and evaluate climate simulation, fostering workforce development for a multidisciplinary research community on “Data + Computing + Climate Science”, and raising interest in both IT technology and climate studies among K-12 students, and various underrepresented groups. The project thus serves the national interest, as stated in NSF’s mission, by promoting the progress of science and advancing national prosperity and welfare.

The goal of this CAREER project is to study efficient and reproducible causality analytics for large-scale climate data, so that climate scientists can easily test their causal hypotheses, reproduce existing studies and compare different causality analytics results. To handle the increasing dimensionality and resolution of spatiotemporal climate datasets, the project will study incremental causality discovery algorithms for large-scale climate datasets and parallel causality discovery for spatiotemporal climate data. To address the variety of both causal discovery algorithms and climate simulation/observation datasets, the project will study how to effectively measure climate causality results from different causality algorithms and different climate datasets, and integrate causality results through ensemble techniques. To cope with difficulties in conducting and reproducing causality analytics with large-scale climate datasets, the project will study cloud computing for big data climate analytics pipeline construction and execution optimization. The project will be evaluated from two perspectives. From the computing perspective, the research will be evaluated in terms of algorithm computation complexity, algorithm accuracy and algorithm scalability. From the climate perspective, the applicability of the research will be evaluated by collaborating with climate scientists in their specific research programs.


  1. Pei Guo (PhD’21). Doctoral Thesis: Scalable Multivariate Causality Discovery From Large-scale Global Spatiotemporal Climate Data. Finalist of the BenchCouncil 2021 Distinguished Doctoral Dissertation Award.

Publications and GitHub Repositories

  1. Xin Wang, Pei Guo, Xingyan Li, Jianwu Wang, Aryya Gangopadhyay, Carl E. Busart and Jade Freeman. Reproducible and Portable Big Data Analytics in the Cloud. arXiv preprint arXiv:2112.09762, 2021. [Open Access Paper, Open Source Code]
  2. Xin Wang, Pei Guo, Jianwu Wang. Large-Scale Causality Discovery Analytics as a Service. Accepted by the Fifth IEEE International Workshop on Benchmarking, Performance Tuning and Optimization for Big Data Applications (BPOD 2021), IEEE. [Paper Pre-Print, Open Source Code]
  3. Sahara Ali, Yiyi Huang, Xin Huang, Jianwu Wang. Sea Ice Forecasting using Attention-based Ensemble LSTM. Tackling Climate Change with Machine Learning workshop at International Conference on Machine Learning (ICML), 2021. https://www.climatechange.ai/papers/icml2021/50, [Open Access Paper, Open Source Code]
  4. Pei Guo, Yiyi Huang, Jianwu Wang. Scalable and Flexible Two-Phase Ensemble Algorithms for Causality Discovery. Big Data Research, vol. 26, no. 100252, November 2021. DOI:10.1016/j.bdr.2021.100252, [Open Access Paper, Open Source Code]
  5. Yiyi Huang, Matthäus Kleindessner, Alexey Munishkin, Debvrat Varshney, Pei Guo, Jianwu Wang. Benchmarking of Data-Driven Causality Discovery Approaches in the Interactions of Arctic Sea Ice and Atmosphere. Data-driven Climate Sciences Section, Frontiers in Big Data, Frontiers, August 2021. DOI:10.3389/fdata.2021.642182, [Open Access Paper, Open Source Code]
  6. Pei Guo, Achuna Ofonedu, Jianwu Wang. Scalable and Hybrid Ensemble-Based Causality Discovery. In Proceedings of the 2020 IEEE International Conference on Smart Data Services (SMDS 2020), pages 72-80, IEEE, 2020. DOI: 10.1109/SMDS49396.2020.00016, [Paper Pre-Print, Open Source Code], Best Student Paper Award!
  7. Ping Hou, Peng Wu, Pei Guo, Jianwu Wang, Aryya Gangopadhyay, Zhibo Zhang. A Deep Learning Model for Detecting Dust in Earth’s Atmosphere from Satellite Remote Sensing Data. 2020 IEEE International Conference on Smart Computing (SMARTCOMP), pages 196-201, 2020. DOI: 10.1109/SMARTCOMP50058.2020.00045, [Paper Pre-Print, Open Source Code]


The work is funded by the NSF CAREER award CAREER: Big Data Climate Causality Analytics.