Programming Statistical Applications in R
3.9 (29 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
1,103 students enrolled
Wishlisted Wishlist

Please confirm that you want to add Programming Statistical Applications in R to your Wishlist.

Add to Wishlist

Programming Statistical Applications in R

An introductory course that teaches the foundations of scientific and statistical programming using R software.
3.9 (29 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
1,103 students enrolled
Last updated 3/2016
Current price: $10 Original price: $30 Discount: 67% off
5 hours left at this price!
30-Day Money-Back Guarantee
  • 11 hours on-demand video
  • 8 Supplemental Resources
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
What Will I Learn?
  • Understand how to create and manipulate R data structures used in scientific programming applications.
  • Understand and use important statistical R programming concepts such as looping and control structures, interactive data input and formatting output, writing functions as programs, writing output to a file and plotting output.
  • Understand and be able to use the R apply family of functions efficiently.
  • Know how to debug programs and how to make programs run more efficiently.
  • Understand and be able to implement various resampling methods effectively, including bootstrapping, jackknifing and N-fold cross validation.
View Curriculum
  • Students will need to install the popular no-cost R Console and RStudio software (instructions provided).

Programming Statistical Applications in R is an introductory course teaching the basics of programming mathematical and statistical applications using the R language. The course makes extensive use of the Introduction to Scientific Programming and Simulation using R (spuRs) package from the Comprehensive R Archive Network (CRAN). The course is a scientific-programming foundations course and is a useful complement and precursor to the more simulation-application oriented R Programming for Simulation and Monte-Carlo Methods Udemy course. The two courses were originally developed as a two-course sequence (although they do share some exercises in common). Together, both courses provide a powerful set of unique and useful instruction about how to create your own mathematical and statistical functions and applications using R software.

Programming Statistical Applications in R is a "hands-on" course that comprehensively teaches fundamental R programming skills, concepts and techniques useful for developing statistical applications with R software. The course also uses dozens of "real-world" scientific function examples. It is not necessary for a student to be familiar with R, nor is it necessary to be knowledgeable about programming in general, to successfully complete this course. This course is 'self-contained' and includes all materials, slides, exercises (and solutions); in fact, everything that is seen in the course video lessons is included in zipped, downloadable materials files. The course is a great instructional resource for anyone interested in refining their skills and knowledge about statistical programming using the R language. It would be useful for practicing quantitative analysis professionals, and for undergraduate and graduate students seeking new job-related skills and/or skills applicable to the analysis of research data.

The course begins with basic instruction about installing and using the R console and the RStudio application and provides necessary instruction for creating and executing R scripts and R functions. Basic R data structures are explained, followed by instruction on data input and output and on basic R programming techniques and control structures. Detailed examples of creating new statistical R functions, and of using existing statistical R functions, are presented. Boostrap and Jackknife resampling methods are explained in detail, as are methods and techniques for estimating inference and for constructing confidence intervals, as well as of performing N-fold cross validation assessments of competing statistical models. Finally, detailed instructions and examples for debugging and for making R programs run more efficiently are demonstrated.

Who is the target audience?
  • You do NOT need to be experienced with R, nor do you need to have experience with computer programming to successfully complete this course.
  • The course would be useful to anyone interested in learning more about statistical programming using the R language.
  • Course is good for undergraduate students seeking to acquire programming skills and knowledge of R software.
  • Course is useful for graduate students seeking to acquire and refine their skills relating to data analysis and manipulation.
Students Who Viewed This Course Also Viewed
Curriculum For This Course
88 Lectures
Introduction to Course Materials, Installing Packages, and Executing Scripts
16 Lectures 01:35:25

Introduction to Course Materials

RStudio is an Integrated Development Environment (IDE) software tool developed especially to run R software.

Install R and RStudio

R is a programming language and software environment for statistical computing and graphics. The R language is widely used among statisticians and data miners for developing statistical software and data analysis.

General Discussion of R

A Look at the R Console and RStudio

Executing Script and Installing Packages in RStudio (part 2)

R Script Demonstrations using RStudio

To make the best of the R language, you'll need a strong understanding of the basic data types and data structures and how to operate on those.

It is very Important to understand because these are the objects you will manipulate on a day-to-day basis in R.

Scripting Basic Data Structures (part 1)

Scripting Basic Data Structures (part 2)

Functions have named arguments which potentially have default values. The formal arguments are the arguments included in the function definition. The formals function returns a list of all the formal arguments of a function. Not every function call in R makes use of all the formal arguments. Function arguments can be missing or might have default values.

R Functions (part 1)

R Functions (part 2)

R Functions (part 3)

Creating matrices

The function matrix creates matrices.
 matrix(data, nrow, ncol, byrow) 
The data argument is usually a list of the elements that will fill the matrix. The nrow and ncol arguments specify the dimension of the matrix. Often only one dimension argument is needed if, for example, there are 20 elements in the data list and ncol is specified to be 4 then R will automatically calculate that there should be 5 rows and 4 columns since 4*5=20. The byrow argument specifies how the matrix is to be filled. The default value for byrow is FALSE which means that by default the matrix will be filled column by column.
Manipulating Matrices (part 1)

Manipulating Matrices (part 2)

Manipulating Matrices (part 3)
Basic R Programming Concepts and Techniques
19 Lectures 02:04:01
Basic R Programming Concepts and Examples (part 1)

R has the standard control structures you would expect. expr can be multiple (compound) statements by enclosing them in braces { }. It is more efficient to use built-in functions rather than control structures whenever possible.


if (<em>cond</em>) <em>expr</em><br> if (<em>cond</em>) <em>expr1</em> else <em>expr2</em>


for (<em>var</em> in <em>seq</em>) <em>expr</em>


while (<em>cond</em>) <em>expr</em>


switch(<em>expr</em>, ...)



Looping Control Structure Examples (part 1)

Looping Control Structure Examples (part 2)

Looping and Control Structure Exercises

Data Input and Output (part 1)

Data Input and Output (part 2)

Formatting Output (part 1)

Formatting Output (part 2)

Interactive Input and Output

Looping and Control Structure Exercises (part 1)

Looping and Control Structure Exercises (part 2)

Looping and Control Structure Exercises (part 3)

Writing Output to a File (part 1)

Writing Output to a File (part 2)

Plotting as Output (part 1)

Plotting as Output (part 2)

Exercise: Writing Statistical and Scientific Expressions
1 page

Exercise Solution: Writing Statistical and Scientific Functions
8 pages
Writing User-Defined Functions in R
17 Lectures 02:10:04
Writing Functions as Programs (part 1)

Writing Functions as Programs (part 2)

User-written Functions

One of the great strengths of R is the user's ability to add functions. In fact, many of the functions in Rare actually functions of functions. The structure of a function is given below.

myfunction <- function(<em>arg1, arg2, ... </em>){<br> <em>statements</em><br> return(<em>object</em>)<br> }

Objects in the function are local to the function. The object returned can be any data type.

Writing Functions in R (part 1)

Writing Functions in R (part 2)

Writing Functions in R (part 3)

Writing Functions in R (part 4)

Apply Family of Functions (part 1)

Apply Family of Functions (part 2)

Apply Family of Functions (part 3)

Apply Family of Functions (part 4)

Apply Family of Functions (part 5)

Making Programs Run Efficiently

Exercise: Writing Functions and Programs
2 pages

Exercise Solutions: Writing Functions and Programs (part 1)

Exercise Solutions: Writing Functions and Programs (part 2)

Exercise: Vector Maker Functions
Data Types and Structures: Factors, Dataframes and Lists
10 Lectures 01:28:04
Exercise Solutions: Vector Maker Functions (part 1)

Exercise Solutions: Vector Maker Functions (part 2)


Tell R that a variable is nominal by making it a factor. The factor stores the nominal values as a vector of integers in the range [ 1... k ] (where k is the number of unique values in the nominal variable), and an internal vector of character strings (the original values) mapped to these integers.

Data Types: Factors (part 1)

Data Types: Factors (part 2)

Data Frames

A data frame is more general than a matrix, in that different columns can have different modes (numeric, character, factor, etc.). This is similar to SAS and SPSS datasets.

Data Structures: Dataframes (part 1)

Data Structures: Dataframes (part 2)

Data Structures: Dataframes (part 3)

Data Structures: Dataframes (part 4)


An ordered collection of objects (components). A list allows you to gather a variety of (possibly unrelated) objects under one name.

Data Structures: Lists (part 1)

Data Structures: Lists (part 2)
Bootstrap and Jackknife Resampling Methods
11 Lectures 01:29:26

In statistics, resampling is any of a variety of methods for doing one of the following:

  1. Estimating the precision of sample statistics (medians, variances, percentiles) by using subsets of available data (jackknifing) or drawing randomly with replacement from a set of data points (bootstrapping)
  2. Exchanging labels on data points when performing significance tests (permutation tests, also called exact tests, randomization tests, or re-randomization tests)
  3. Validating models by using random subsets (bootstrapping, cross validation)

Common resampling techniques include bootstrapping, jackknifing and permutation tests.

Preview 07:42

Bootstrap Estimate of Standard Error and Bias (part 2)

Bootstrapping a Ratio Statistic

Jackknife Estimate of Bias and Standard Error

Bootstrapping Confidence Intervals (part 1)

Bootstrapping Confidence Intervals (part 2)

Bootstrapping Confidence Intervals (part 3)

In k-fold (also called n-fold) cross-validation, the original sample is randomly partitioned into k equal sized subsamples. Of the k subsamples, a single subsample is retained as the validation data for testing the model, and the remaining k − 1 subsamples are used as training data. The cross-validation process is then repeated k times (the folds), with each of the k subsamples used exactly once as the validation data. The k results from the folds can then be averaged (or otherwise combined) to produce a single estimation. The advantage of this method over repeated random sub-sampling (see below) is that all observations are used for both training and validation, and each observation is used for validation exactly once. 10-fold cross-validation is commonly used,[7] but in general k remains an unfixed parameter.

When k=n (the number of observations), the k-fold cross-validation is exactly the leave-one-out cross-validation.

In stratified k-fold cross-validation, the folds are selected so that the mean response value is approximately equal in all the folds. In the case of a dichotomous classification, this means that each fold contains roughly the same proportions of the two types of class labels.

N-Fold Cross Validation of Models (part 1)

N-Fold Cross-Validation of Models (part 2)

N-Fold Cross-Validation of Models (part 3)

Bootstrap-Jackknife Resampling Exercise
Debugging and Program Efficiency
15 Lectures 02:00:58
Bootstrap-Jackknife Resampling Exercise Solution

Debugging R Programs

Findruns Program Debugging Example (part 2)

Another approach can be employed that makes use of the local environment within a function to access the variables. When we define methods with this approach later, Local Environment Approach, the results will look more like object oriented approaches seen in other languages.

The approach relies on the local scope created when a function is called. A new environment is created that can be identified using the environment command. The environment can be saved in the list created for the class, and the variables within this scope can then be accessed using the identification of the environment.

Additional Programming Considerations

Program Efficiencies and Scoping Rules

An environment, in R, can be thought of as a list of variables and their values. I'm not sure if this is how it is achieved in practice, but it helps me to think of it as a look-up table - for example if a variable x appears in an expression, then the R interpreter refers to the entry for x in the appropriate look-up table. From this, it retrieves the value for x - basically some kind of R entity, and substitutes this for x in the expression. If there is no entry for x in the table - or in any of the other possible environents - an error is flagged.

Which environment R uses depends on context - if you are typing into the R command line, the environment used is called the global environment. When a function is called a new environment especailly for this function is created automatically - and is destroyed on leaving the function. This is the default environment for any variables created during the execution of the function. Finally, it is worth noting that a particular variable name can apprear in more than one environment - and so if R tries to find the value of a variable, and the variable name appears in more than one environment, the rules governing which environment R will search determine the value that will be found.

Selecting Environment to Debug

First, everything in R is treated like as an object. We have seen this with functions. Many of the objects that are created within an R session have attributes associated with them. One common attribute associated with an object is its class.

You can set the class attribute using the class command. One thing to notice is that the class is a vector which allows an object to inherit from multiple classes, and it allows you to specify the order of inheritance for complex classes. You can also use the class command to determine the classes associated with an object.

Creating S3 and S4 Classes (part 1)

Here we look at two different ways to construct an S3 class. The first approach is more commonly used and is more straightforward. It makes use of basic list properties. The second approach makes use of the local environment within a function to define the variables tracked by the class. The advantage to the second approach is that it looks more like the object oriented approach that many are familiar with. The disadvantage is that it is more difficult to read the code, and it is more like working with pointers which is different from the way other objects work in R.

Creating S3 and S4 Classes (part 2)

The S4 approach differs from the S3 approach to creating a class in that it is a more rigid definition. The idea is that an object is created using the setClass command. The command takes a number of options. Many of the options are not required, but we make use of several of the optional arguments because they represent good practices with respect to object oriented programming.

Creating S3 and S4 Classes (part 3)

Numerical Accuracy and Program Efficiency (part 1)

Numerical Accuracy and Program Efficiency (part 2)

More on Program Efficiency (part 1)

More on Program Efficiency (part 2)

Selection Sort Exercise
About the Instructor
Geoffrey Hubona, Ph.D.
4.0 Average rating
1,476 Reviews
12,604 Students
28 Courses
Associate Professor of Information Systems

Dr. Geoffrey Hubona held full-time tenure-track, and tenured, assistant and associate professor faculty positions at 3 major state universities in the Eastern United States from 1993-2010. Currently, he is a visiting associate professor of MIS at Texas A&M International University. In these positions, he taught dozens of various statistics, business information systems, and computer science courses to undergraduate, master's and Ph.D. students. He earned a Ph.D. in Business Administration (Information Systems and Computer Science) from the University of South Florida (USF) in Tampa, FL; an MA in Economics, also from USF; an MBA in Finance from George Mason University in Fairfax, VA; and a BA in Psychology from the University of Virginia in Charlottesville, VA. He is the founder of the Georgia R School (2010-2014) and of R-Courseware (2014-Present), online educational organizations that teach research methods and quantitative analysis techniques. These research methods techniques include linear and non-linear modeling, multivariate methods, data mining, programming and simulation, and structural equation modeling and partial least squares (PLS) path modeling.