
A key highlight of this unit is the study of probability distributions — including binomial and normal distributions — which model how data behaves in real-world scenarios. You will learn how to work with the three-sigma rule and use z-tables to interpret probabilities with confidence.
The unit then advances into multivariate data, where we explore relationships between variables. Concepts such as correlation coefficients, covariance, and comparing multiple correlations are introduced to help you understand how two or more variables interact. You will also study how to represent these relationships visually, making complex statistical ideas accessible at a glance.
Whether you are analyzing exam scores, financial returns, or sensor readings, understanding data shape is the critical first step. By the end of this unit, you will be able to identify whether your data follows a known distribution, detect outliers, and describe the statistical relationship between variables with clarity and precision. This unit bridges the gap between pure mathematics and practical data storytelling — setting a strong foundation for the visualization techniques you will apply throughout this course.
Data for visualization rarely arrives perfectly clean — it is most often sourced from relational databases, flat files, or web APIs. This unit covers how to source data from SQL, JSON, XML, and other common formats. Students learn to use relational databases as a foundation for visualization pipelines, with coverage of joins, queries, and schema design. A major focus is handling messy data: types of messiness, pairwise deletion, mean substitution, and hot deck imputation are discussed as practical remedies. More advanced topics include complete case analysis, stochastic regression imputation, and multiple imputation for filling gaps. The unit concludes with methods for checking data quality — detecting out-of-bounds values, verifying data types, and identifying unexpected categories, outliers, typographical errors, and unlikely or duplicate values that distort visual output.Layout is one of the most powerful and often overlooked tools in data visualization. This unit begins with an exploration of positioning systems — absolute versus relative placement, semantic distance and proximity, and how the physical space of a visualization shapes its meaning. Students learn about logical and physical relationships, how to represent objects and grouped elements, and how organizational patterns such as graphs, trees, and circular layouts communicate hierarchy and connection. Axis styles, circle and line arrangements, and the use of shapes and typography as visual encoding channels are covered in depth. Practical topics include leveraging common color theory, cognitive interference effects (such as the Stroop test and the role of color), and how size, comparing size, and direct labeling of data points guide a reader's attention. The unit ties layout decisions back to the broader goal of conveying size, meaning, and story with minimal friction.
Effective visualization is not just about choosing a chart type — it requires deliberate design thinking. This unit examines the classification of visualization by complexity and purpose, distinguishing between infographics and data visualization, and contrasting information versus explanation and exploration versus persuasion. Students study the role of the designer versus the reader and how context of use shapes design decisions. Foundational principles include knowledge before structure, choosing appropriate visual encodings, the importance of natural ordering, and distinguishing distinct from continuous values. The unit also addresses redundant encoding, defaults versus innovative formats, and readers' contextual expectations. Critically, students examine what makes visualizations fail — abused structures, misleading comparisons, clutter from unexpected categories, and typographical errors. Selecting structures based on patterns and consistency is emphasized, along with a focus on simplicity in design as the ultimate goal.
Data for visualization rarely arrives perfectly clean — it is most often sourced from relational databases, flat files, or web APIs. This unit covers how to source data from SQL, JSON, XML, and other common formats. Students learn to use relational databases as a foundation for visualization pipelines, with coverage of joins, queries, and schema design. A major focus is handling messy data: types of messiness, pairwise deletion, mean substitution, and hot deck imputation are discussed as practical remedies. More advanced topics include complete case analysis, stochastic regression imputation, and multiple imputation for filling gaps. The unit concludes with methods for checking data quality — detecting out-of-bounds values, verifying data types, and identifying unexpected categories, outliers, typographical errors, and unlikely or duplicate values that distort visual output.
https://dpv.justinsaju.me/ check the link and explore the lab experiments
data visualization with immersive and three-dimensional environments. Students are introduced to 3D mesh compression techniques — corner-table representation, geometry compression, connectivity compression, and edge compression — as well as retiling and direct manipulation methods. The data analysis pipeline is examined for virtual reality applications: from raw data to geometry compression to the rendering pipeline and system architecture considerations. Students explore how scientific visualization differs from other VR applications, including the role of direct manipulation in virtual environments, advantages and disadvantages of distributed implementation, and time-critical rendering techniques. Topics also include how geometric modelling forms the backbone of VR-based data representation — covering the fundamentals of scene graphs, coordinate systems, and interaction paradigms needed to build meaningful, navigable data
This course introduces students to the principles and practices of data visualization and pattern analysis. It equips learners with the knowledge to identify various data sources, handle messy data, and apply effective visual encoding techniques to communicate insights clearly and accurately.
The course begins with foundational concepts of data shape analysis, covering univariate and multivariate distributions, probability, correlation, and sampling techniques. Students then explore how to retrieve and manage data from relational databases and formats such as SQL, JSON, and XML, including strategies for handling missing or inconsistent data.
A significant portion of the course focuses on the considerations behind effective data visualization — including contextual relevance, appropriate encoding, redundancy avoidance, and compatibility with real-world scenarios. Students study data layout principles such as positioning, sizing, use of color, typography, and the organization of grouped objects to achieve clarity and simplicity in design.
The final unit introduces geometric modelling and virtual environments for visualization, covering 3D mesh compression, data analysis pipelines, and the application of direct manipulation techniques in virtual reality settings.
By the end of this course, students will be able to analyze univariate and multivariate data, implement methods to handle messy datasets, select appropriate visualization techniques, and develop customized layouts using geometric modelling and virtual environments.