2017-08-15 00:43:48

Learning Path: R:Master Data Analysis & Visualization with R

14 students enrolled

Please confirm that you want to add **Learning Path: R:Master Data Analysis & Visualization with R** to your Wishlist.

Harness the power of R for effective data analysis and visualizations

New

14 students enrolled

What Will I Learn?

- Import and export data in various formats in R
- Perform advanced statistical data analysis
- Visualize your data on Google or Open Street maps
- Create simple and quick visualizations using the basic graphic tools in R
- Implement interactive visualizations using ggplot2.
- Add elements, text, animation, and colors to your plot to make sense of data
- Master network, radial, and coxcomb plots

Requirements

- Basic programming knowledge of R
- Basic knowledge of Math and Statistics

Description

R is one of the most comprehensible statistical tool for managing and manipulating data. With the ever increasing number of data, there is a very high demand of professionals who have got skills to analyze these data. *If you're looking forward to becoming an expert data analyst, then go for this Learning Path.*

Packt’s Video Learning Paths are a series of individual video products put together in a logical and stepwise manner such that each video builds on the skills learned in the video before it.

The highlights of this Learning Path are:

*Manipulate and analyze small and large sets of data with R**Practice with real-world examples of data analysis and visualization*

Let’s take a quick look at your learning journey! This Learning Path begins with familiarizing you with the programming and statistics aspects of R. You will learn how CRAN works and why to use it. Acquire the ability to conduct data analysis in practical contexts with R, using core language packages and tools. You will then generate various plots in R using the basic R plotting techniques. Learn how to **make plots, charts, and maps in step-by-step manner**. Utilize R packages to** add context and meaning to your data**.

Moving ahead, the Learning Path will gradually take you through creating interactive maps using the **googleVis** package. Finally, you will generate chloropleth maps and contouring maps, bubble plots, and pie charts.

*By the end of this Learning Path, you will be equipped with all data analysis and visualization techniques and build a strong foundation for moving into data science.*

**About the Author:**

We have combined the best works of the following esteemed authors to ensure that your learning journey is smooth:

**Dr. Samik Sen** is a theoretical physicist and loves thinking about hard problems. After his PH.D. in developing computational methods to solve problems for which no solutions existed, he began thinking about how to tackle math problems while lecturing. He has a YouTube channel associated with data science, which also provides a valuable engagement with people round the world who look at problems from a different perspective.

**Fabio Veronesi** obtained a Ph.D. in digital soil mapping from Cranfield University and then moved to ETH Zurich, where he has been working for the past three years as a postdoc. In his career, Dr. Veronesi worked at several topics related to environmental research: digital soil mapping, cartography and shaded relief, renewable energy and transmission line siting. During this time, he specialized in the application of spatial statistical techniques to environmental data.

**Atmajit Singh Gohil** works as a senior consultant at a consultancy firm in New York City. After graduating, he worked in the financial industry as a Fixed Income Analyst. He writes about data manipulation, data exploration, visualization, and basic R plotting functions on his blog. He has a master's degree in financial economics from the State University of New York (SUNY), Buffalo. He also graduated with a Master of Arts degree in economics from the University of Pune, India.

Who is the target audience?

- This Learning Path is aimed at aspiring or professional statisticians, data analysts, or data scientists who want to analyze and visualize data for gaining deeper insights of it.

Compare to Other Data Analysis Courses

Curriculum For This Course

131 Lectures

11:20:20
+
–

Speaking ‘R’ - The Language of Data Science
19 Lectures
02:18:40

This video gives an overview of the entire course.

Preview
03:57

The aim of the video is to introduce the section and overview of the language R.

What Is R?

02:12

We need to have the core programs before we can begin and in this video,we show where to get them.

Getting and Setting Up R/Rstudio

02:01

In this video, we look at where to begin, so that we can get started.

Using RStudio

03:52

In this video, you will learn how RStudio has packages which avoid the problems and how we'll work on them.

Packages

08:01

In this video, you will learn how similar R is to other languages.

Preview
07:57

In this video, we see more familiar things in R.

Familiar Building Programming Blocks

07:49

In this video, we are now ready to write programs.

Putting It All Together

11:27

In this video, we will look at R data types which are new.

Core R Types

10:18

In this video, we will introduce some key commands to study data.

Some Useful Operations

05:12

In this video, we willintroduce various commands to help us pick out elements in which we are interested in.

More Useful Operations

03:18

In this video, we will investigate the Titanic dataset to see what it says.

Titanic

11:11

In this video, we willadd a value by processing our data.

Tennis

12:42

In this video,we will download football results from a web page.

It's Mostly Cleaning Up

12:21

In this video, we will use R to do some statistics.

The Most Widely Used Statistical Package

10:15

In this video, we will work with distributions using R.

Distributions

09:27

In this video, we will see some of R's graphical power.

Time to Get Graphical

06:41

In this video, we will use the plotting package, ggplot2.

Plotting to Another Dimension

03:37

In this video, we will see another plotting technique known as Facets.

Facets

06:22

Test Your Knowledge

5 questions

+
–

Learning Data Analysis with R
75 Lectures
05:59:07

This video provides an overview of the entire course.

Preview
04:16

Accessing and importing open access environmental data is a crucial skill for data scientists. This section teaches you how to download data from the Web, import it in R and check it for consistency.

Importing Data from Tables (read.table)

02:30

Often times, datasets are provided for free, but on FTP, websites and practitioners need to be able to access them. R is perfectly capable of downloading and importing data from FTP sites.

Downloading Open Data from FTP Sites

04:03

Not all text files can be opened easily with read.table. The fixed-width format is still popular but requires a bit more work in R.

Fixed-Width Format

04:24

Some data files are simply too difficult to be imported with simple functions. Luckily R provides the readLines function that allows importing of even the most difficult tables.

Importing with read.lines (The Last Resort)

03:20

Most open data is generated automatically and therefore may contain NA or other values that need to be removed. R has various functions to deal with this problem.

Cleaning Your Data

02:36

To follow the exercises in the book viewers would need to install several important packages. This video will explain how to do and where to find information about them.

Loading the Required Packages

04:09

Vector data are very popular and widespread and require some thoughts before importing. R has dedicated tools to import these data and work with them.

Importing Vector Data (ESRI shp and GeoJSON)

04:02

Often times, spatial data is provided in tables and needs to be transformed before it can be used for analysis. This can be done simply with the sp package.

Transforming from data.frame to SpatialPointsDataFrame

02:50

Geographical projections are very important and need to be handled carefully. R provides robust functions to do so successfully.

Understanding Projections

03:06

Many datasets have a temporal component and practitioners need to know how to deal with it. R provides functions to do that in a very easy way.

Basic time/dates formats

03:50

Raster data is fundamentally different from vector data, since its values refer to specific areas (cells) and no single locations. This video will clearly explain this difference and teach users how to import this data in R.

Introducing the Raster Format

04:58

The NetCDF format is becoming very popular, since it allows to store 4D datasets. This requires some technical skills to be accessed and this video will teach viewers to open and import NetCDF files.

Reading Raster Data in NetCDF

06:10

Many raster datasets we download from the web are distributed in tiles, meaning a single raster for each subset of the area. To obtain a full raster for the study area we are interested to cover we can create a mosaic.

Mosaicking

02:52

Mosaicking involves merging rasters based on location. Spatio-temporal datasets include also multiple rasters for the same location but different times. To merge these we need to use the stacking function.

Stacking to Include the Temporal Component

04:10

Once we complete our analysis we often need to export our results and share them with colleagues. Popular formats are CSV and TXT files, which we learn how to export in this video.

Exporting Data in Tables

03:12

If we work with vector data and we want to share the same format with our co-workers, we need to learn how to export in vector formats. This will be covered here.

Exporting Vector Data (ESRI shp File)

02:21

Many raster datasets we download from the Web are distributed in tiles, meaning a single raster for each subset of the area. To obtain a full raster for the study area we are interested in covering, we can create a mosaic.

Exporting Rasters in Various Formats (GeoTIFF, ASCII Grids)

02:42

Nowadays WebGIS applications are extremely popular. However, to use our data for WebGIS, we first need to export them in the correct format. This video will show how to do that.

Exporting Data for WebGIS Systems (GeoJSON, KML)

02:40

In the previous volume we explored the basics R functions and syntaxes to import various types of data. In this video we will put these functions together, and overcome some unexpected challenges, to import a full year of NOAA data.

Preparing the Dataset

07:44

Before we can start analyzing our data we first need to properly understand what we are dealing with. The first step we have to take in this direction is describe our data with simple statistical indexes.

Measuring Spread (Standard Deviation and Standard Distance)

03:23

Numerical summaries are very useful but certainly not ideal to provide us with a direct feeling for the dataset in hands. Plots are much more informative and thus being able to produce them is certainly a crucial skill for data analysts.

Understanding Your Data with Plots

05:50

For multivariate data we are often interested in assessing correlation between variables. This can be done in R very easily, and ggplot2 can also be used to produce more informative plots.

Plotting for Multivariate Data

03:02

Detecting outliers is another basic skill that every data analyst should have and master. R provides a lot of technical tools to help us in finding outliers.

Finding Outliers

03:50

This Section will be dedicated entirely to manipulating vector data. However, viewers first need to familiarize with some basic concepts, otherwise they may not be able to understand the rest of the section.

Introduction

03:37

In volume 1 we learned how to set the projection of our spatial data. However, in many cases we have to change this projection to successfully complete our analysis, and this requires some specific knowledge.

Re-Projecting Your Data

02:54

In many cases we may be interested in understanding the relation between spatial objects. One of such relations is the intersection, where we first want to know how two objects intersect, and then also extract only the part of one of these object that is included or outside the first.

Intersection

03:07

Other important GIS operations that users have to master involve creating buffers and calculating distances between objects.

Buffer and Distance

03:22

The last two GIS functions that anybody should master are used to merge different geometries and spatial objects and overlay.

Union and Overlay

03:32

Raster objects are imported in R as rectangular matrixes. Users needs to be aware of this to properly work on these data, otherwise it may create some issues during the data analysis.

Introduction

04:43

In many cases open data are not distributed directly in raster formats and they need to be converted. This can be easily done with the right functions.

Converting Vector/Table Data into Raster

04:00

Working with raster data often means extracting data for particular locations for further analysis, or crop the data to reduce their size. These are essential skills to master for any data analyst.

Subsetting and Selection

03:16

Sometimes we may need to filter out some values of our raster. It may seem tricky but only because it requires some skills.

Filtering

04:58

Creating new raster by calculating their value is extremely important for spatial data analysis. Doing so is simple but can be difficult to understand at first.

Raster Calculator

04:44

Syntactically plotting spatial data in R is no different than plotting other types of data. Therefore, users need to know the basics of plotting before they can start making maps.

Plotting Basics

05:15

Creating multilayer plot can be difficult because we need to take care of several different aspects at once. However, learning that is very easy.

Adding Layers

05:44

When plotting spatial data we are often interested in using colors to show the values of some variables. This can be done manually but producing the right color scale may be difficult. This issue can be solved employing automatic methods.

Color Scale

04:51

Creating multivariate plots not only means adding layers, but also using legends so that the viewer understands what the plot is showing. Creating legends in R is tricky because it requires a lot of tweaking, which will be explained here.

Creating Multivariate Plots

09:09

Temporal data need to be treated with specific procedures to highlight this additional component. This may be done in different ways depending on the scope of the analysis and R provides the right platform for this.

Handling the Temporal Component

03:20

Being able to plot spatial data on web maps is certainly helpful and a crucial skill to have, but it can be difficult since it requires knowledge of different technologies. R makes this process very easy with dedicated functions that allow us to plot on web GIS services a breeze.

Introduction

02:32

Plotting data with the function plotGoogleMaps is not as easy as using the function plot. With a simple step by step guide we can achieve good command of the function, so that users can plot whatever data they choose.

Plotting Vector Data on Google Maps

05:45

An interactive map with just one layer is hardly useful for our purposes. Many times we are faced with the challenge of plotting several data at once. This requires some additional work and understanding, but it is definitely not hard in R.

Adding Layers

04:41

Plotting raster data on Google maps can be tricky. The function plotGoogleMaps does not handle rasters very well and if not done correctly the visualization will fail. This video will show users how to plot rasters successfully.

Plotting Raster Data on Google Maps

04:19

Plotting on Google Maps is easy but Google Maps are commercial products therefore if we want to use the on our commercial website we would need to pay. OpenStreetMaps are free to use, therefore knowing how to use them is certainly an advantage.

Using Leaflet to Plot on Open Street Maps

09:03

Using open data for our analysis requires a deep knowledge of the data provider and the actual data we are using. Without this knowledge we may end up with erroneous results.

Introduction

02:21

Downloading data from the World Bank can be difficult since it requires users to know the acronym used to refer to these data. However, with some help this process becomes very easy.

Importing Data from the World Bank

05:08

To create a spatial map of the World Bank data we just have to download and we need to transform them into spatial data. However, in the dataset there are no coordinates of other information that may help us do that. The solution is to use the geocoding information from another dataset for this purpose.

Adding Geocoding Information

05:38

Using the world bank data just to plot a static spatial map is very limitative. There are tons of other uses that researchers can do with these data and this video serves to provide some guidance into these additional avenue of research.

Concluding Remarks

03:48

Executing a point pattern analysis is technically easy in R. However, it is extremely important that practitioners understand the theory behind a point pattern analysis to ensure the correctness of the results. This video illustrates this theory.

Theoretical Background

07:31

In many cases practitioners start their analysis by applying complex statistics without even looking at their data. This is a problem that may affect the correctness of their results. This video will teach the correct order to start a point pattern analysis.

Introduction

07:37

Calculating intensity and density of a point pattern can be done in many ways. Finding the best for the dataset in hand can be challenging. The package spatstat and the literature provides some tips to do it correctly.

Intensity and Density

07:38

By looking at the plot we created in the previous videos, we started understanding the spatial distribution of our data. However, we now need to prove quantitatively that our ideas are correct.

Spatial Distribution

10:02

In many cases we may want to model a point pattern to try and explain its location intensity in a way that would allow us to predict it outside our study area. This requires a general understanding of the modelling process, which will be explained here.

Modelling

06:41

Cluster analysis is commonly used in many fields. The problem is that in order to use it correctly we need to understand the clustering process, which is what this video is about.

Theoretical Background

04:30

As in every data analysis the data preparation plays a crucial role in guaranteeing its success. This video will prepare the data to be used for clustering.

Data Preparation

05:50

Clustering algorithms are extremely simple to apply. The challenge is interpret their results and try to understand what the algorithm is telling us in terms of insights into our data.

K-Means Clustering

05:26

When applying the k-means algorithms we need to specify the number of clusters in which we want our dataset to be divided. However, since it is often used as explanatory test, we may not know the optimal number of clusters.

Optimal Number of Clusters

05:17

Hierarchical clustering allows us to see how all of our points are related to each other with a bottom-up approach. However, determining the optimal number of clusters is not so trivial with this method.

Hierarchical Clustering

06:33

Determining the best clustering algorithm for our data is probably the most challenging part of such an analysis. This video will show the sort of reasoning users will need to make that decision.

Concluding

04:32

Time series analysis is another important technique to master. However, it requires some specific knowledge to understand the process and what this technique can actually do.

Theoretical Background

04:34

Time-series can be imported and analyzed using two formats: ts and xts. Both have their pros and cons and users need to be able to master both if they want to perform the best time-series analysis.

Reading Time-Series in R

06:37

Dealing with time-series sometimes means extracting data according to their location along the time line. This can be done in R but require some explanation to do it correctly.

Subsetting and Temporal Functions

05:15

Another important aspect of time-series analysis is decomposition and correlation. This allows us to draw important conclusions about our data. Technically this is not difficult to do, but it requires careful consideration if we want to do it right.

Decomposition and Correlation

07:33

The final step of time-series analysis is forecasting, where we try to simulate future events. This is extremely useful but requires adequate knowledge of the methods available, their pros and cons.

Forecasting

04:32

There are numerous geostatistical interpolation techniques that can be used to map environmental data. Kriging is probably the most famous but it not the only one available. It is important to know every technique to understand where to use what.

Theoretical Background

04:42

The first challenge of any geostatistical analysis is the data preparation. We cannot just download data, but we need to clean them and prepare them for analysis.

Data Preparation

06:20

Simple interpolation is easy to use and easy to interpret, therefore it is still commonly used. The package gstat allows us to use inverse distance, but to do so we need to follow some simple but precise rules.

Mapping with Deterministic Estimators

06:56

Before we can interpolate our data using kriging, we need to take care of some important steps. For example, we need to check if our data has a trend and then test for normality, because kriging can only be applied to normally distributed data.

Analyzing Trend and Checking Normality

04:57

Variogram is the keystone of kriging interpolation and users need to know how to compute and fit a model to it. These things require careful considerations that we are going to explore here.

Variogram Analysis

05:52

In this video, all concepts learned previously will be merged to perform a kriging interpolation. The problem in this case is making sure that everything works correctly and the process is smooth.

Mapping with kriging

06:17

There are numerous statistical learning algorithms that can be used to map environmental data. It is important to know every technique to understand where to use what.

Theoretical Background

04:08

Once again for data analysis, getting to know our data is the most important thing we need to do once we start. This can be done by looking at the data provider and using some explanatory techniques.

Dataset

02:36

Many users start a data analysis by testing complex methods. This is a problem though, because many times a simpler method can help us better understand our data. This video shows how to fit these simple models.

Linear Regression

06:06

Regression trees are extremely powerful algorithms, but sometimes are considered as black boxes. This is a problem because only expert users can understand their output. This may change simply by understanding how these algorithms work.

Regression Trees

04:13

Support Vector Machines

05:05

Test Your Knowledge

5 questions

+
–

R Data Visualization - Basic Plots, Maps, and Pie Charts
37 Lectures
03:02:33

This video gives an overview of the entire course.

Preview
03:24

R comes loaded with some basic packages, but the R community is rapidly growing and active R users are constantly developing new packages for R.

Installing Packages and Getting Help in R

05:34

Everything in R is in the form of objects. Objects can be manipulated in R.

Data Types and Special Values in R

04:46

We will dive into R's capability with regard to matrices and edit (add, delete, or replace) elements of a matrix.

Matrices and Editing a Matrix in R

04:28

One of the useful and widely used functions in R is the data.frame() function.

Data frames and Editing a data frame in R

03:56

Once we have processed our data, we need to save it to an external device or send it to our colleagues. It is possible to export data in R in many different formats.

Importing and Exporting Data in R

04:35

Most of the tasks in R are performed using functions. A function in R has the same utility as functions in Arithmetic.

Writing a Function and if else Statement in R

03:12

If we want to perform an action repeatedly in R, we can utilize the loop functionality.

Basic and Nested Loops in R

02:16

R has some very handy functions, such as apply, sapply, tapply, and mapply, that can be used to reduce the task of writing complicated statements.

The apply, lapply, sapply, and tapply Functions

03:32

One quick and easy way to edit a plot is by generating the plot in R and then using Inkspace or any other software to edit it.

Using and Saving Par to Beautify a Plot in R

03:51

Scatter plots are used primarily to conduct a quick analysis of the relationships among different variables in our data.

Introducing a Scatter Plot with Texts, Labels, and Lines

13:12

We will display multivariate data on a scatter plot and also introduce interactive scatter plots.

Connecting Points and Generating an Interactive Scatter Plot

08:21

The advantage of using the Google Chart API in R is the flexibility it provides in making interactive plots.

A Simple and Interactive Bar Plot

10:49

Line plots are simply lines connecting all the x and y dots. They are very easy to interpret and are widely used to display an upward or downward trend in data.

Introduction to Line Plot and Its Effective Story

08:09

Gantt charts are used to track the progress of a project displayed against time.

Generating an Interactive Gantt/Timeline Chart in R

02:57

Plot a histogram using the googleVis package and merge more than one histogram on the same page.

Merging Histograms

04:11

The advantage of the Google Chart API is the interactivity and the ease with which they can be attached to a web page.

Making an Interactive Bubble Plot

04:30

The waterfall plots or staircase plots are observed mostly in financial reports.

Constructing a Waterfall Plot in R

03:01

This video helps you get introduced to the concept of dendrograms.

Constructing a Simple Dendrogram

06:47

This video teaches you to create a plot which is easy to study and more informative.

Creating Dendrograms with Colors and Labels

05:13

Heat maps are a visual representation of data, wherein each value in a matrix is represented with a color. This video shows you how to create a heat map.

Creating Heat Maps

04:35

This video dives into plotting a heat map by customizing colors.

Generating a Heat Map with Customized Colors

02:20

This video teaches you to integrate a dendrogram and heat map into a single plot.

Generating an Integrated Dendrogram and a Heat Map

02:34

R allows us to plot three-dimensional interactive heat maps using the heat map package.

Creating a Three- Dimensional Heat Map and Stereo Map

02:35

Tree maps are basically rectangles placed adjacent to each other. The size of each rectangle is directly proportional to the data being used in the visualization.

Constructing a Tree Map in R

05:22

We encounter maps on a daily basis, be it for directions or to infer information regarding the distribution of data. Maps have been widely used to plot various types of data in R.

Introducing Regional Maps

04:28

Choropleth maps can be state level as well as county level. In this video, we will plot well-being data on a state level.

Introducing Choropleth Maps

04:31

Contour maps are used to display data related to temperature or topographic information.

A Guide to Contour Maps

05:16

For each region, a bubble or a pie chart is used that represents percentage.

Constructing Maps with bubbles

05:07

Overlaying maps with text is not a very prominent medium of displaying information.

Integrating Text with Maps

06:12

The shapefile package in R can be used to read a shapefile, add the processed data to our shapefile, and then save it in the shapefile format.

Introducing Shapefiles

04:44

The idea of a cartogram is to show the gravity of the issue or data being studied.

Creating Cartograms

05:48

Pie charts are a great visualization technique to represent data and help viewers understand statistical data.

Generating a Simple Pie Chart

04:54

Labels are important as they give the information about the sections of the pie chart. We will include labels inside the pie chart in this video.

Constructing Pie Charts with Labels

05:31

Donut charts have advantages over pie charts with respect to the area and efficiency in visualizing information.

Creating Donut Plots and Interactive Plots

05:31

Instead of using multiple pie charts for comparing data, we can use slope charts.

Generating a Slope Chart

03:28

Fan plots are an alternative to pie charts and are useful in differential and comparative analysis.

Constructing a Fan Plot

02:53

Test Your Knowledge

5 questions

About the Instructor