Research

Our research focuses on interdisciplinary big data analytics for different types of big data (high speed streaming data and large volume batch data) and different analytics application domains (including climate and manufacturing).

As illustrated in the Venn diagram, our research lies in the intersection of data science, distributed computing and real-world big data applications. Distributed computing is the key enabling computing infrastructure perspective research topic on how to efficiently analyze big data. Data science is the core research topic to learn patterns from big data. Real-world big data application is the research frontier that will identify new research challenges on big data and evaluate the benefits of the proposed work. This strategy conducts research from a holistic and end-to-end perspective, including identifying new and important research challenges, proposing integrative solutions, and evaluating the solutions from many aspects (efficiency, effectiveness, etc.).

Please check details of our current research topics via the links below or sub-menu items.

  • Causality Discovery: Causality is a fundamental research topic studying cause-effect relationships among different components of a system and causality study can help explain why the system has certain behaviors. We mainly study novel causality discover methods/systems for the large-scale and complex climate observation and simulation datasets.
  • Earth Informatics: Earth data is an important type of big data with unique features: 1) the large volume of Earth simulation/observation data, 2) geospatial and temporal data, 3) internal physics/dynamics studied in Earth science. We study novel data mining approaches for various Earth applications such as cloud type classification and sea ice forecasting.
  • Distributed Analytics: To handle the exponential growth of available data, traditional ways of conducting data processing and mining are either too time consuming or infeasible. We study novel algorithms/systems on how to analyze large volume data efficiently: 1) how to parallelize existing data processing/mining algorithms, 2) how to maintain good data mining accuracy while improving execution performance.
  • Stream Analytics: Besides volume, another important aspect of big data is velocity, in which data are generated continuously and often in high-speed. We study novel algorithms/systems on: 1) how to handle challenges such as concept drifting that are unique for streaming data analytics, and 2) how to design software architecture/algorithms to work with both incoming streaming data and historical batch data.