Programming Statistical Applications in R

1,103 students enrolled

Please confirm that you want to add **Programming Statistical Applications in R** to your Wishlist.

An introductory course that teaches the foundations of scientific and statistical programming using R software.

1,103 students enrolled

What Will I Learn?

- Understand how to create and manipulate R data structures used in scientific programming applications.
- Understand and use important statistical R programming concepts such as looping and control structures, interactive data input and formatting output, writing functions as programs, writing output to a file and plotting output.
- Understand and be able to use the R apply family of functions efficiently.
- Know how to debug programs and how to make programs run more efficiently.
- Understand and be able to implement various resampling methods effectively, including bootstrapping, jackknifing and N-fold cross validation.

Requirements

- Students will need to install the popular no-cost R Console and RStudio software (instructions provided).

Description

** Programming Statistical Applications in R** is an introductory course teaching the basics of programming mathematical and statistical applications using the R language. The course makes extensive use of the

* Programming Statistical Applications in R* is a "hands-on" course that comprehensively teaches fundamental R programming skills, concepts and techniques useful for developing statistical applications with R software. The course also uses dozens of "real-world" scientific function examples. It is not necessary for a student to be familiar with R, nor is it necessary to be knowledgeable about programming in general, to successfully complete this course. This course is 'self-contained' and includes all materials, slides, exercises (and solutions); in fact, everything that is seen in the course video lessons is included in zipped, downloadable materials files. The course is a great instructional resource for anyone interested in refining their skills and knowledge about statistical programming using the R language. It would be useful for practicing quantitative analysis professionals, and for undergraduate and graduate students seeking new job-related skills and/or skills applicable to the analysis of research data.

The course begins with basic instruction about installing and using the R console and the RStudio application and provides necessary instruction for creating and executing R scripts and R functions. Basic R data structures are explained, followed by instruction on data input and output and on basic R programming techniques and control structures. Detailed examples of creating new statistical R functions, and of using existing statistical R functions, are presented. Boostrap and Jackknife resampling methods are explained in detail, as are methods and techniques for estimating inference and for constructing confidence intervals, as well as of performing N-fold cross validation assessments of competing statistical models. Finally, detailed instructions and examples for debugging and for making R programs run more efficiently are demonstrated.

Who is the target audience?

- You do NOT need to be experienced with R, nor do you need to have experience with computer programming to successfully complete this course.
- The course would be useful to anyone interested in learning more about statistical programming using the R language.
- Course is good for undergraduate students seeking to acquire programming skills and knowledge of R software.
- Course is useful for graduate students seeking to acquire and refine their skills relating to data analysis and manipulation.

Students Who Viewed This Course Also Viewed

Curriculum For This Course

88 Lectures

10:58:58
+
–

Introduction to Course Materials, Installing Packages, and Executing Scripts
16 Lectures
01:35:25

Preview
01:58

Introduction to Course Materials

03:21

RStudio is an Integrated Development Environment (IDE) software tool developed especially to run R software.

Install R and RStudio

00:45

**R** is a programming language and **software **environment for statistical computing and graphics. The **R** language is widely used among statisticians and data miners for developing statistical **software** and data analysis.

General Discussion of R

07:34

A Look at the R Console and RStudio

04:43

Executing Script and Installing Packages in RStudio (part 2)

07:08

R Script Demonstrations using RStudio

06:40

To make the best of the R language, you'll need a strong understanding of the basic data types and data structures and how to operate on those.

It is **v****ery Important** to understand because these are the objects you will manipulate on a day-to-day basis in R.

Scripting Basic Data Structures (part 1)

07:46

Scripting Basic Data Structures (part 2)

08:28

Functions have named arguments which potentially have default values. The formal arguments are the arguments included in the function definition. The formals function returns a list of all the formal arguments of a function. Not every function call in R makes use of all the formal arguments. Function arguments can be missing or might have default values.

R Functions (part 1)

07:11

R Functions (part 2)

06:59

R Functions (part 3)

07:11

Creating matrices

The functionmatrixcreates matrices.

matrix(data, nrow, ncol, byrow)

Thedataargument is usually a list of the elements that will fill the matrix. Thenrowandncolarguments specify the dimension of the matrix. Often only one dimension argument is needed if, for example, there are 20 elements in thedatalist andncolis specified to be 4 then R will automatically calculate that there should be 5 rows and 4 columns since 4*5=20. Thebyrowargument specifies how the matrix is to be filled. The default value forbyrowis FALSE which means that by default the matrix will be filled column by column.

Manipulating Matrices (part 1)

06:15

Manipulating Matrices (part 2)

06:22

Manipulating Matrices (part 3)

05:39

+
–

Basic R Programming Concepts and Techniques
19 Lectures
02:04:01

Basic R Programming Concepts and Examples (part 1)

07:15

**R** has the standard control structures you would expect. **expr** can be multiple (compound) statements by enclosing them in braces { }. It is more efficient to use built-in functions rather than control structures whenever possible.

`if (<em>cond</em>) <em>expr</em><br> if (<em>cond</em>) <em>expr1</em> else <em>expr2</em>`

`for (<em>var</em> in <em>seq</em>) <em>expr</em>`

`while (<em>cond</em>) <em>expr</em>`

`switch(<em>expr</em>, ...)`

`ifelse(<em>test</em>,<em>yes</em>,<em>no</em>)`

Looping Control Structure Examples (part 1)

07:39

Looping Control Structure Examples (part 2)

08:48

Looping and Control Structure Exercises

00:50

Data Input and Output (part 1)

07:06

Data Input and Output (part 2)

05:56

Formatting Output (part 1)

10:13

Formatting Output (part 2)

07:46

Interactive Input and Output

07:54

Looping and Control Structure Exercises (part 1)

09:15

Looping and Control Structure Exercises (part 2)

07:35

Looping and Control Structure Exercises (part 3)

07:50

Writing Output to a File (part 1)

06:55

Writing Output to a File (part 2)

06:37

Plotting as Output (part 1)

06:12

Plotting as Output (part 2)

07:33

Exercise: Writing Statistical and Scientific Expressions

1 page

Exercise Solution: Writing Statistical and Scientific Functions

8 pages

+
–

Writing User-Defined Functions in R
17 Lectures
02:10:04

Writing Functions as Programs (part 1)

10:03

Writing Functions as Programs (part 2)

08:02

User-written Functions

One of the great strengths of **R** is the user's ability to add functions. In fact, many of the functions in **R**are actually functions of functions. The structure of a function is given below.

`myfunction <- function(<em>arg1, arg2, ... </em>){<br> <em>statements</em><br> return(<em>object</em>)<br> }`

Objects in the function are local to the function. The object returned can be any data type.

Writing Functions in R (part 1)

08:15

Writing Functions in R (part 2)

08:04

Writing Functions in R (part 3)

09:35

Writing Functions in R (part 4)

07:31

Apply Family of Functions (part 1)

08:28

Apply Family of Functions (part 2)

08:53

Apply Family of Functions (part 3)

07:40

Apply Family of Functions (part 4)

10:43

Apply Family of Functions (part 5)

06:07

Making Programs Run Efficiently

10:27

Exercise: Writing Functions and Programs

2 pages

Exercise Solutions: Writing Functions and Programs (part 1)

07:58

Exercise Solutions: Writing Functions and Programs (part 2)

04:40

Exercise: Vector Maker Functions

04:38

+
–

Data Types and Structures: Factors, Dataframes and Lists
10 Lectures
01:28:04

Exercise Solutions: Vector Maker Functions (part 1)

09:02

Exercise Solutions: Vector Maker Functions (part 2)

07:41

Factors

Tell **R** that a variable is **nominal **by making it a factor. The factor stores the nominal values as a vector of integers in the range [ 1... k ] (where k is the number of unique values in the nominal variable), and an internal vector of character strings (the original values) mapped to these integers.

Data Types: Factors (part 1)

08:33

Data Types: Factors (part 2)

10:20

Data Frames

A data frame is more general than a matrix, in that different columns can have different modes (numeric, character, factor, etc.). This is similar to SAS and SPSS datasets.

Data Structures: Dataframes (part 1)

07:33

Data Structures: Dataframes (part 2)

08:14

Data Structures: Dataframes (part 3)

08:22

Data Structures: Dataframes (part 4)

06:47

Lists

An ordered collection of objects (components). A list allows you to gather a variety of (possibly unrelated) objects under one name.

Data Structures: Lists (part 1)

09:29

Data Structures: Lists (part 2)

12:03

+
–

Bootstrap and Jackknife Resampling Methods
11 Lectures
01:29:26

In statistics, **resampling** is any of a variety of methods for doing one of the following:

- Estimating the precision of sample statistics (medians, variances, percentiles) by using subsets of available data (
**jackknifing**) or drawing randomly with replacement from a set of data points (**bootstrapping**) - Exchanging labels on data points when performing significance tests (
**permutation tests**, also called exact tests, randomization tests, or re-randomization tests) - Validating models by using random subsets (bootstrapping, cross validation)

Common resampling techniques include bootstrapping, jackknifing and permutation tests.

Preview
07:42

Bootstrap Estimate of Standard Error and Bias (part 2)

07:39

Bootstrapping a Ratio Statistic

10:13

Jackknife Estimate of Bias and Standard Error

11:30

Bootstrapping Confidence Intervals (part 1)

08:41

Bootstrapping Confidence Intervals (part 2)

09:13

Bootstrapping Confidence Intervals (part 3)

10:27

In *k*-fold (also called n-fold) cross-validation, the original sample is randomly partitioned into *k* equal sized subsamples. Of the *k* subsamples, a single subsample is retained as the validation data for testing the model, and the remaining *k* − 1 subsamples are used as training data. The cross-validation process is then repeated *k* times (the *folds*), with each of the *k* subsamples used exactly once as the validation data. The *k* results from the folds can then be averaged (or otherwise combined) to produce a single estimation. The advantage of this method over repeated random sub-sampling (see below) is that all observations are used for both training and validation, and each observation is used for validation exactly once. 10-fold cross-validation is commonly used,^{[7]} but in general *k* remains an unfixed parameter.

When *k*=*n* (the number of observations), the *k*-fold cross-validation is exactly the leave-one-out cross-validation.

In *stratified* *k*-fold cross-validation, the folds are selected so that the mean response value is approximately equal in all the folds. In the case of a dichotomous classification, this means that each fold contains roughly the same proportions of the two types of class labels.

N-Fold Cross Validation of Models (part 1)

07:33

N-Fold Cross-Validation of Models (part 2)

04:42

N-Fold Cross-Validation of Models (part 3)

10:42

Bootstrap-Jackknife Resampling Exercise

01:04

+
–

Debugging and Program Efficiency
15 Lectures
02:00:58

Bootstrap-Jackknife Resampling Exercise Solution

03:28

Debugging R Programs

15:13

Findruns Program Debugging Example (part 2)

07:29

Another approach can be employed that makes use of the local environment within a function to access the variables. When we define methods with this approach later, *Local Environment Approach*, the results will look more like object oriented approaches seen in other languages.

The approach relies on the local scope created when a function is called. A new environment is created that can be identified using the *environment* command. The environment can be saved in the list created for the class, and the variables within this scope can then be accessed using the identification of the environment.

Additional Programming Considerations

10:54

Program Efficiencies and Scoping Rules

11:45

An **environment**, in R, can be thought of as a list of variables and their values. I'm not sure if this is how it is achieved in practice, but it helps me to think of it as a look-up table - for example if a variable `x`

appears in an expression, then the R interpreter refers to the entry for `x`

in the appropriate look-up table. From this, it retrieves the value for `x`

- basically some kind of R entity, and substitutes this for `x`

in the expression. If there is no entry for `x`

in the table - or in any of the other possible environents - an error is flagged.

Which environment R uses depends on context - if you are typing into the R command line, the environment used is called the *global environment*. When a function is called a new environment especailly for this function is created automatically - and is destroyed on leaving the function. This is the default environment for any variables created during the execution of the function. Finally, it is worth noting that a particular variable name can apprear in more than one environment - and so if R tries to find the value of a variable, and the variable name appears in more than one environment, the rules governing which environment R will search determine the value that will be found.

Selecting Environment to Debug

04:20

First, everything in R is treated like as an object. We have seen this with functions. Many of the objects that are created within an R session have attributes associated with them. One common attribute associated with an object is its **class**.

You can set the **class** attribute using the *class* command. One thing to notice is that the class is a vector which allows an object to inherit from multiple classes, and it allows you to specify the order of inheritance for complex classes. You can also use the *class* command to determine the classes associated with an object.

Creating S3 and S4 Classes (part 1)

07:22

Here we look at two different ways to construct an S3 class. The first approach is more commonly used and is more straightforward. It makes use of basic list properties. The second approach makes use of the local environment within a function to define the variables tracked by the class. The advantage to the second approach is that it looks more like the object oriented approach that many are familiar with. The disadvantage is that it is more difficult to read the code, and it is more like working with pointers which is different from the way other objects work in R.

Creating S3 and S4 Classes (part 2)

06:15

The S4 approach differs from the S3 approach to creating a class in that it is a more rigid definition. The idea is that an object is created using the *setClass* command. The command takes a number of options. Many of the options are not required, but we make use of several of the optional arguments because they represent good practices with respect to object oriented programming.

Creating S3 and S4 Classes (part 3)

06:26

Numerical Accuracy and Program Efficiency (part 1)

07:42

Numerical Accuracy and Program Efficiency (part 2)

10:51

More on Program Efficiency (part 1)

06:20

More on Program Efficiency (part 2)

06:42

Selection Sort Exercise

03:58

About the Instructor