Research

Our research focuses on interdisciplinary big data analytics for different types of big data (high speed streaming data and large volume batch data) and different analytics application domains (including climate and manufacturing).

As illustrated in the venn diagram on the right, our research lies in the intersection of data science, distributed computing and real-world big data applications.

Please check details of our current research topics via the links below or sub-menu items.

  • Causality Discovery: Causality is a fundamental research topic studying cause-effect relationships among different components of a system and causality study can help explain why the system has certain behaviors. We mainly study novel causality discover methods/systems for the large-scale and complex climate observation and simulation datasets.
  • Earth Informatics: Earth data is an important type of big data with unique features: 1) the large volume of Earth simulation/observation data, 2) geospatial and temporal data, 3) internal physics/dynamics studied in Earth science. We study novel data mining approaches for various Earth applications such as cloud type classification and sea ice forecasting.
  • Distributed Analytics: To handle the exponential growth of available data, traditional ways of conducting data processing and mining are either too time consuming or infeasible. We study novel algorithms/systems on how to analyze large volume data efficiently: 1) how to parallelize existing data processing/mining algorithms, 2) how to maintain good data mining accuracy while improving execution performance.
  • Stream Analytics: Besides volume, another important aspect of big data is velocity, in which data are generated continuously and often in high-speed. We study novel algorithms/systems on: 1) how to handle challenges such as concept drifting that are unique for streaming data analytics, and 2) how to design software architecture/algorithms to work with both incoming streaming data and historical batch data.