Programming Statistical Applications in R

An introductory course that teaches the foundations of scientific and statistical programming using R software.
3.3 (21 ratings) Instead of using a simple lifetime average, Udemy calculates a
course's star rating by considering a number of different factors
such as the number of ratings, the age of ratings, and the
likelihood of fraudulent ratings.
886 students enrolled
$19
$30
37% off
Take This Course
  • Lectures 88
  • Length 11 hours
  • Skill Level All Levels
  • Languages English
  • Includes Lifetime access
    30 day money back guarantee!
    Available on iOS and Android
    Certificate of Completion
Wishlisted Wishlist

How taking a course works

Discover

Find online courses made by experts from around the world.

Learn

Take your courses with you and learn anywhere, anytime.

Master

Learn and practice real-world skills and achieve your goals.

About This Course

Published 9/2015 English

Course Description

Programming Statistical Applications in R is an introductory course teaching the basics of programming mathematical and statistical applications using the R language. The course makes extensive use of the Introduction to Scientific Programming and Simulation using R (spuRs) package from the Comprehensive R Archive Network (CRAN). The course is a scientific-programming foundations course and is a useful complement and precursor to the more simulation-application oriented R Programming for Simulation and Monte-Carlo Methods Udemy course. The two courses were originally developed as a two-course sequence (although they do share some exercises in common). Together, both courses provide a powerful set of unique and useful instruction about how to create your own mathematical and statistical functions and applications using R software.

Programming Statistical Applications in R is a "hands-on" course that comprehensively teaches fundamental R programming skills, concepts and techniques useful for developing statistical applications with R software. The course also uses dozens of "real-world" scientific function examples. It is not necessary for a student to be familiar with R, nor is it necessary to be knowledgeable about programming in general, to successfully complete this course. This course is 'self-contained' and includes all materials, slides, exercises (and solutions); in fact, everything that is seen in the course video lessons is included in zipped, downloadable materials files. The course is a great instructional resource for anyone interested in refining their skills and knowledge about statistical programming using the R language. It would be useful for practicing quantitative analysis professionals, and for undergraduate and graduate students seeking new job-related skills and/or skills applicable to the analysis of research data.

The course begins with basic instruction about installing and using the R console and the RStudio application and provides necessary instruction for creating and executing R scripts and R functions. Basic R data structures are explained, followed by instruction on data input and output and on basic R programming techniques and control structures. Detailed examples of creating new statistical R functions, and of using existing statistical R functions, are presented. Boostrap and Jackknife resampling methods are explained in detail, as are methods and techniques for estimating inference and for constructing confidence intervals, as well as of performing N-fold cross validation assessments of competing statistical models. Finally, detailed instructions and examples for debugging and for making R programs run more efficiently are demonstrated.

What are the requirements?

  • Students will need to install the popular no-cost R Console and RStudio software (instructions provided).

What am I going to get from this course?

  • Understand how to create and manipulate R data structures used in scientific programming applications.
  • Understand and use important statistical R programming concepts such as looping and control structures, interactive data input and formatting output, writing functions as programs, writing output to a file and plotting output.
  • Understand and be able to use the R apply family of functions efficiently.
  • Know how to debug programs and how to make programs run more efficiently.
  • Understand and be able to implement various resampling methods effectively, including bootstrapping, jackknifing and N-fold cross validation.

What is the target audience?

  • You do NOT need to be experienced with R, nor do you need to have experience with computer programming to successfully complete this course.
  • The course would be useful to anyone interested in learning more about statistical programming using the R language.
  • Course is good for undergraduate students seeking to acquire programming skills and knowledge of R software.
  • Course is useful for graduate students seeking to acquire and refine their skills relating to data analysis and manipulation.

What you get with this course?

Not for you? No problem.
30 day money back guarantee.

Forever yours.
Lifetime access.

Learn on the go.
Desktop, iOS and Android.

Get rewarded.
Certificate of completion.

Curriculum

Section 1: Introduction to Course Materials, Installing Packages, and Executing Scripts
Course Introduction
Preview
01:58
Introduction to Course Materials
03:21
00:45

RStudio is an Integrated Development Environment (IDE) software tool developed especially to run R software.

07:34

R is a programming language and software environment for statistical computing and graphics. The R language is widely used among statisticians and data miners for developing statistical software and data analysis.

A Look at the R Console and RStudio
04:43
Executing Script and Installing Packages in RStudio (part 1)
Preview
07:25
Executing Script and Installing Packages in RStudio (part 2)
07:08
R Script Demonstrations using RStudio
06:40
07:46

To make the best of the R language, you'll need a strong understanding of the basic data types and data structures and how to operate on those.

It is very Important to understand because these are the objects you will manipulate on a day-to-day basis in R.

Scripting Basic Data Structures (part 2)
08:28
07:11

Functions have named arguments which potentially have default values. The formal arguments are the arguments included in the function definition. The formals function returns a list of all the formal arguments of a function. Not every function call in R makes use of all the formal arguments. Function arguments can be missing or might have default values.

R Functions (part 2)
06:59
R Functions (part 3)
07:11
06:15

Creating matrices

The function matrix creates matrices.
 matrix(data, nrow, ncol, byrow) 
The data argument is usually a list of the elements that will fill the matrix. The nrow and ncol arguments specify the dimension of the matrix. Often only one dimension argument is needed if, for example, there are 20 elements in the data list and ncol is specified to be 4 then R will automatically calculate that there should be 5 rows and 4 columns since 4*5=20. The byrow argument specifies how the matrix is to be filled. The default value for byrow is FALSE which means that by default the matrix will be filled column by column.
Manipulating Matrices (part 2)
06:22
Manipulating Matrices (part 3)
05:39
Section 2: Basic R Programming Concepts and Techniques
Basic R Programming Concepts and Examples (part 1)
07:15
Basic R Programming Concepts and Examples (part 2)
Preview
08:37
07:39

R has the standard control structures you would expect. expr can be multiple (compound) statements by enclosing them in braces { }. It is more efficient to use built-in functions rather than control structures whenever possible.

if-else

if (<em>cond</em>) <em>expr</em><br> if (<em>cond</em>) <em>expr1</em> else <em>expr2</em>

for

for (<em>var</em> in <em>seq</em>) <em>expr</em>

while

while (<em>cond</em>) <em>expr</em>

switch

switch(<em>expr</em>, ...)

ifelse

ifelse(<em>test</em>,<em>yes</em>,<em>no</em>)

Looping Control Structure Examples (part 2)
08:48
Looping and Control Structure Exercises
00:50
Data Input and Output (part 1)
07:06
Data Input and Output (part 2)
05:56
Formatting Output (part 1)
10:13
Formatting Output (part 2)
07:46
Interactive Input and Output
07:54
Looping and Control Structure Exercises (part 1)
09:15
Looping and Control Structure Exercises (part 2)
07:35
Looping and Control Structure Exercises (part 3)
07:50
Writing Output to a File (part 1)
06:55
Writing Output to a File (part 2)
06:37
Plotting as Output (part 1)
06:12
Plotting as Output (part 2)
07:33
Exercise: Writing Statistical and Scientific Expressions
1 page
Exercise Solution: Writing Statistical and Scientific Functions
8 pages
Section 3: Writing User-Defined Functions in R
Writing Functions as Programs (part 1)
10:03
Writing Functions as Programs (part 2)
08:02
Windsorized Means Example
Preview
09:00
08:15

User-written Functions

One of the great strengths of R is the user's ability to add functions. In fact, many of the functions in Rare actually functions of functions. The structure of a function is given below.

myfunction <- function(<em>arg1, arg2, ... </em>){<br> <em>statements</em><br> return(<em>object</em>)<br> }

Objects in the function are local to the function. The object returned can be any data type.

Writing Functions in R (part 2)
08:04
Writing Functions in R (part 3)
09:35
Writing Functions in R (part 4)
07:31
Apply Family of Functions (part 1)
08:28
Apply Family of Functions (part 2)
08:53
Apply Family of Functions (part 3)
07:40
Apply Family of Functions (part 4)
10:43
Apply Family of Functions (part 5)
06:07
Making Programs Run Efficiently
10:27
Exercise: Writing Functions and Programs
2 pages
Exercise Solutions: Writing Functions and Programs (part 1)
07:58
Exercise Solutions: Writing Functions and Programs (part 2)
04:40
Exercise: Vector Maker Functions
04:38
Section 4: Data Types and Structures: Factors, Dataframes and Lists
Exercise Solutions: Vector Maker Functions (part 1)
09:02
Exercise Solutions: Vector Maker Functions (part 2)
07:41
08:33

Factors

Tell R that a variable is nominal by making it a factor. The factor stores the nominal values as a vector of integers in the range [ 1... k ] (where k is the number of unique values in the nominal variable), and an internal vector of character strings (the original values) mapped to these integers.

Data Types: Factors (part 2)
10:20
07:33

Data Frames

A data frame is more general than a matrix, in that different columns can have different modes (numeric, character, factor, etc.). This is similar to SAS and SPSS datasets.

Data Structures: Dataframes (part 2)
08:14
Data Structures: Dataframes (part 3)
08:22
Data Structures: Dataframes (part 4)
06:47
09:29

Lists

An ordered collection of objects (components). A list allows you to gather a variety of (possibly unrelated) objects under one name.

Data Structures: Lists (part 2)
12:03
Section 5: Bootstrap and Jackknife Resampling Methods
07:42

In statistics, resampling is any of a variety of methods for doing one of the following:

  1. Estimating the precision of sample statistics (medians, variances, percentiles) by using subsets of available data (jackknifing) or drawing randomly with replacement from a set of data points (bootstrapping)
  2. Exchanging labels on data points when performing significance tests (permutation tests, also called exact tests, randomization tests, or re-randomization tests)
  3. Validating models by using random subsets (bootstrapping, cross validation)

Common resampling techniques include bootstrapping, jackknifing and permutation tests.

Bootstrap Estimate of Standard Error and Bias (part 2)
07:39
Bootstrapping a Ratio Statistic
10:13
Jackknife Estimate of Bias and Standard Error
11:30
Bootstrapping Confidence Intervals (part 1)
08:41
Bootstrapping Confidence Intervals (part 2)
09:13
Bootstrapping Confidence Intervals (part 3)
10:27
07:33

In k-fold (also called n-fold) cross-validation, the original sample is randomly partitioned into k equal sized subsamples. Of the k subsamples, a single subsample is retained as the validation data for testing the model, and the remaining k − 1 subsamples are used as training data. The cross-validation process is then repeated k times (the folds), with each of the k subsamples used exactly once as the validation data. The k results from the folds can then be averaged (or otherwise combined) to produce a single estimation. The advantage of this method over repeated random sub-sampling (see below) is that all observations are used for both training and validation, and each observation is used for validation exactly once. 10-fold cross-validation is commonly used,[7] but in general k remains an unfixed parameter.

When k=n (the number of observations), the k-fold cross-validation is exactly the leave-one-out cross-validation.

In stratified k-fold cross-validation, the folds are selected so that the mean response value is approximately equal in all the folds. In the case of a dichotomous classification, this means that each fold contains roughly the same proportions of the two types of class labels.

N-Fold Cross-Validation of Models (part 2)
04:42
N-Fold Cross-Validation of Models (part 3)
10:42
Bootstrap-Jackknife Resampling Exercise
01:04
Section 6: Debugging and Program Efficiency
Bootstrap-Jackknife Resampling Exercise Solution
03:28
Debugging R Programs
15:13
Findruns Program Debugging Example (part 1)
Preview
12:13
Findruns Program Debugging Example (part 2)
07:29
10:54

Another approach can be employed that makes use of the local environment within a function to access the variables. When we define methods with this approach later, Local Environment Approach, the results will look more like object oriented approaches seen in other languages.

The approach relies on the local scope created when a function is called. A new environment is created that can be identified using the environment command. The environment can be saved in the list created for the class, and the variables within this scope can then be accessed using the identification of the environment.

Program Efficiencies and Scoping Rules
11:45
04:20

An environment, in R, can be thought of as a list of variables and their values. I'm not sure if this is how it is achieved in practice, but it helps me to think of it as a look-up table - for example if a variable x appears in an expression, then the R interpreter refers to the entry for x in the appropriate look-up table. From this, it retrieves the value for x - basically some kind of R entity, and substitutes this for x in the expression. If there is no entry for x in the table - or in any of the other possible environents - an error is flagged.

Which environment R uses depends on context - if you are typing into the R command line, the environment used is called the global environment. When a function is called a new environment especailly for this function is created automatically - and is destroyed on leaving the function. This is the default environment for any variables created during the execution of the function. Finally, it is worth noting that a particular variable name can apprear in more than one environment - and so if R tries to find the value of a variable, and the variable name appears in more than one environment, the rules governing which environment R will search determine the value that will be found.

07:22

First, everything in R is treated like as an object. We have seen this with functions. Many of the objects that are created within an R session have attributes associated with them. One common attribute associated with an object is its class.

You can set the class attribute using the class command. One thing to notice is that the class is a vector which allows an object to inherit from multiple classes, and it allows you to specify the order of inheritance for complex classes. You can also use the class command to determine the classes associated with an object.

06:15

Here we look at two different ways to construct an S3 class. The first approach is more commonly used and is more straightforward. It makes use of basic list properties. The second approach makes use of the local environment within a function to define the variables tracked by the class. The advantage to the second approach is that it looks more like the object oriented approach that many are familiar with. The disadvantage is that it is more difficult to read the code, and it is more like working with pointers which is different from the way other objects work in R.

06:26

The S4 approach differs from the S3 approach to creating a class in that it is a more rigid definition. The idea is that an object is created using the setClass command. The command takes a number of options. Many of the options are not required, but we make use of several of the optional arguments because they represent good practices with respect to object oriented programming.

Numerical Accuracy and Program Efficiency (part 1)
07:42
Numerical Accuracy and Program Efficiency (part 2)
10:51
More on Program Efficiency (part 1)
06:20
More on Program Efficiency (part 2)
06:42
Selection Sort Exercise
03:58

Students Who Viewed This Course Also Viewed

  • Loading
  • Loading
  • Loading

Instructor Biography

Geoffrey Hubona, Ph.D., Professor of Information Systems

Dr. Geoffrey Hubona held full-time tenure-track, and tenured, assistant and associate professor faculty positions at 3 major state universities in the Eastern United States from 1993-2010. In these positions, he taught dozens of various statistics, business information systems, and computer science courses to undergraduate, master's and Ph.D. students. He earned a Ph.D. in Business Administration (Information Systems and Computer Science) from the University of South Florida (USF) in Tampa, FL (1993); an MA in Economics (1990), also from USF; an MBA in Finance (1979) from George Mason University in Fairfax, VA; and a BA in Psychology (1972) from the University of Virginia in Charlottesville, VA. He was a full-time assistant professor at the University of Maryland Baltimore County (1993-1996) in Catonsville, MD; a tenured associate professor in the department of Information Systems in the Business College at Virginia Commonwealth University (1996-2001) in Richmond, VA; and an associate professor in the CIS department of the Robinson College of Business at Georgia State University (2001-2010). He is the founder of the Georgia R School (2010-2014) and of R-Courseware (2014-Present), online educational organizations that teach research methods and quantitative analysis techniques. These research methods techniques include linear and non-linear modeling, multivariate methods, data mining, programming and simulation, and structural equation modeling and partial least squares (PLS) path modeling. Dr. Hubona is an expert of the analytical, open-source R software suite and of various PLS path modeling software packages, including SmartPLS. He has published dozens of research articles that explain and use these techniques for the analysis of data, and, with software co-development partner Dean Lim, has created a popular cloud-based PLS software application, PLS-GUI.

Ready to start learning?
Take This Course