**Ayan BiswasSumya DuttaHan-Wei Shen (OSU)Jonathan Woodring (LANL)**

#### Objective

- Analyze large scale multivariate scientific data sets
- Understand the relationship between the variables and their importance
- Understand the correlation and variability between user-selected variables
- Create an intuitive graphical user interface that allows the scientists to perform step-by-step analysis and data filtering

Impact

- Simplify the task of variable selection and browsing via an information-theoretic analysis framework
- Allow run time data reduction by selecting only the most important variables to explore
- Facilitate intuitive and systematic visual analysis for exploring large scale multivariate scientific data

#### Results

- Software
- ITL: Information Theoretic Data Analysis Library (in progress)

- Publication
- Ayan Biswas, Soumya Dutta, Han-Wei Shen, Jonathan Woodring, An Information-Aware Framework for Exploring Multivariate Data Sets, IEEE Scientific Visualization 2013, also in IEEE Transactions on Visualization and Computer Graphics, 19 (12): 2683-2692 (2013)

#### Notes:

**Objectives:**

The objectives of this research is to provide an exploratory visual analysis environment for scientists to analyze their large scale multivariate data sets. Analyzing multivariate data sets is one of the most challenging analysis problems because the number of variables involved is large, and the correlation between the variables in different value ranges is often not known. Frequently scientists perform the analysis in an ad-hoc manner, and there is a lack of guidance as to what variables should be analyzed together and what is the variability between different pairs of variables.

In this research, we developed a novel information-theoretic framework to quantify the variability among variables in different value ranges. We also suggest metrics to measure the information distances between the variables, which can be viewed in a graphical interface. The interface allows scientists to select both the important variables and the values of interest, where the selection results are displayed in multiple linked visualization windows.

**Impact:**

Our method allows the scientists to focus their effort on the most important variables in value ranges of high interest. This allows an effective reduction of the data sizes for analysis. The graphical user interface facilitates intuitive and systematic visual analysis, which is a key for understanding large scale and complex data sets.

**Results:**

SDAV researchers at OSU are currently developing a distributed-based data analysis library called ITL. Our research result is currently becoming part of the software. We have also published our finding in IEEE Visualization 2013, the very top conference for scientific visualization.