
This course teaches how to program general purpose applications in R. The first part of the course sets the stage for understanding how to program, control and manipulate basic R data structures: vectors, matrices, data frames, and lists. The second part of the course teaches the details of writing functions and programs in the object-oriented R environment: programming R structures directly; writing math and simulation functions; setting up S3 and S4 classes in R; input and output; and string manipulation and performance enhancement.
Explore the R studio integrated development environment and its source front end for interactive and batch R work. Learn about interpreting, vectors, R norm, help, and running code with source.
Set and verify the working directory in R, list files, and save a histogram to a graphical output file, while understanding how sessions and memory govern your workspace.
Explore the R workspace with ls, objects, and search; inspect the global environment and package order; create and print vectors; explore data sets, Nile time series, and histograms.
Learn to visualize univariate distributions in R with histograms, adjustable bins via breaks, and explore core data structures like vectors, lists, and matrices, plus text manipulation with paste and split.
One of the great strengths of R is the user's ability to add functions. In fact, many of the functions in Rare actually functions of functions. The structure of a function is given below.
myfunction <- function(arg1, arg2, ... )
{statements}
Objects in the function are local to the function.
Explore how R creates a fresh function environment on call, how scope separates global and local variables, and how default arguments enable lazy evaluation.
Explore lexical scoping in R with nested functions, showing how inner frames resolve variables by climbing enclosing environments, and how super assignment pushes values to the global environment, risking pollution.
Open account creates a closure with deposit, withdraw, and balance sharing a total, showing updates and overdraft errors, and illustrating optional arguments and named versus positional binding in R.
Explore creating par functions that compare two vectors pairwise using pmax and pmin, then compute medians from resulting vectors, and use lists and named components to return multiple values.
Explore anonymous functions in R, using the function keyword to define unnamed operations and harness vectorization with x and y, while learning lazy argument evaluation in plotting.
A vector is a sequence of data elements of the same basic type. Members in a vector are officially called components. Nevertheless, they are often called elements.
Develop and optimize vector making in R by building a single flexible function, measure performance with system.time, and preallocate vectors while exploring for loops, c, cat, and factorial and prod.
Creating matrices
The function matrix creates matrices.
matrix(data, nrow, ncol, byrow)
The data argument is usually a list of the elements that will fill the matrix. The nrow and ncolarguments specify the dimension of the matrix. Often only one dimension argument is needed if, for example, there are 20 elements in the data list and ncol is specified to be 4 then R will automatically calculate that there should be 5 rows and 4 columns since 4*5=20. The byrowargument specifies how the matrix is to be filled. The default value for byrow is FALSE which means that by default the matrix will be filled column by column.
seq1 <- seq(1:6)
mat1 <- matrix(seq1, 2)
mat2 <- matrix(seq1, 2, byrow = T)
Learn to filter matrices using conditions and subscripts, explore which indexes, and apply row and column operations, including means, sums, and covariance diagonals.
Learn to find the minimum value in a vector using a for loop, and to merge two sorted vectors into a single sorted vector with a preallocated result in R.
Name matrix rows and columns with row names, column names, and dimension names, and use these with matrices and arrays to access and manipulate elements.
A list is an R structure that may contain object of any other types, including other lists. Lots of the modeling functions (like t.test() for the t test or lm() for linear models) produce lists as their return values, but you can also construct one yourself:
mylist <- list (a = 1:5, b = "Hi There", c = function(x) x * sin(x))
Learn to store team premierships in lists, query yearly winners with a loop, and summarize data by group using the apply family and data frames in R.
Data Frames
A data frame is more general than a matrix, in that different columns can have different modes (numeric, character, factor, etc.). This is similar to SAS and SPSS datasets.
d <- c(1,2,3,4)<br> e <- c("red", "white", "red", NA)<br> f <- c(TRUE,TRUE,TRUE,FALSE)<br> mydata <- data.frame(d,e,f)<br> names(mydata) <- c("ID","Color","Passed") # variable names
There are a variety of ways to identify the elements of a data frame .
myframe[3:5] # columns 3,4,5 of data frame<br> myframe[c("ID","Age")] # columns ID and Age from data frame<br> myframe$X1 # variable x1 in the data frame
A data frame is a table, or two-dimensional array-like structure, in which each column contains measurements on one variable, and each row contains one case or sample (observation) with the corresponding values for each variable for that observation.
Explore subsetting data frames by rows and columns, preserve structure with drop = false, apply conditional selections with logical vectors, and handle missing data using complete.cases and new columns.
Lists
An ordered collection of objects (components). A list allows you to gather a variety of (possibly unrelated) objects under one name.
# example of a list with 4 components - <br> # a string, a numeric vector, a matrix, and a scaler <br> w <- list(name="Fred", mynumbers=a, mymatrix=y, age=5.3)<br> <br> # example of a list containing two lists <br> v <- c(list1,list2)
Identify elements of a list using the [[]] convention.
mylist[[2]] # 2nd component of the list<br> mylist[["mynumbers"]] # component named mynumbers in list
Factors
Tell R that a variable is nominal by making it a factor. The factor stores the nominal values as a vector of integers in the range [ 1... k ] (where k is the number of unique values in the nominal variable), and an internal vector of character strings (the original values) mapped to these integers.
# variable gender with 20 "male" entries and <br> # 30 "female" entries <br> gender <- c(rep("male",20), rep("female", 30)) <br> gender <- factor(gender) <br> # stores gender as 20 1s and 30 2s and associates<br> # 1=female, 2=male internally (alphabetically)<br> # R now treats gender as a nominal variable <br> summary(gender)
An ordered factor is used to represent an ordinal variable.
# variable rating coded as "large", "medium", "small'<br> rating <- ordered(rating)<br> # recodes rating to 1,2,3 and associates<br> # 1=large, 2=medium, 3=small internally<br> # R now treats rating as ordinal
R will treat factors as nominal variables and ordered factors as ordinal variables in statistical proceedures and graphical analyses. You can use options in the factor( ) and ordered( ) functions to control the mapping of integers to strings (overiding the alphabetical ordering). You can also use factors to createvalue labels.
1. Creating factor variables
Factor variables are categorical variables that can be either numeric or string variables. There are a number of advantages to converting categorical variables to factor variables. Perhaps the most important advantage is that they can be used in statistical modeling where they will be implemented correctly, i.e., they will then be assigned the correct number of degrees of freedom. Factor variables are also very useful in many different types of graphics. Furthermore, storing string variables as factor variables is a more efficient use of memory. To create a factor variable we use the factor function. The only required argument is a vector of values which can be either string or numeric. Optional arguments include the levels argument, which determines the categories of the factor variable, and the default is the sorted list of all the distinct values of the data vector. The labels argument is another optional argument which is a vector of values that will be the labels of the categories in thelevels argument.
R Programming Environment and Scope
In order to write functions in a proper way and avoid unusual errors, we need to know the concept of environment and scope in R.
R Programming Environment Environment can be thought of as a collection of objects (functions, variables etc.). An environment is created when we first fire up the R interpreter. Any variable we define, is now in this environment. The top level environment available to us at the R command prompt is the global environment called R_GlobalEnv. Global environment can be referred to as .GlobalEnv in R codes as well. We can use thels() function to show what variables and functions are defined in the current environment. Moreover, we can use the environment() function to get the current environment.
Explore nesting of environments in R, tracing how f and h access global and parent frames. Use ls to inspect local and parent environments and understand lazy evaluation and parameters.
Anonymous Functions
As remarked at several points in this book, the purpose of the R function function() is to create functions. For instance, consider this code:
inc <- function(x) return(x+1)
It instructs R to create a function that adds 1 to its argument and then assigns that function to inc. However, that last step—the assignment—is not always taken. We can simply use the function object created by our call tofunction() without naming that object. The functions in that context are called anonymous, since they have no name. (That is somewhat misleading, since even nonanonymous functions only have a name in the sense that a variable is pointing to them.)
learn to implement and compare sorting algorithms in R, focusing on selection sort and insertion sort, and analyze efficiency by studying element comparisons during sorting.
Explore bubble sort alongside selection and insertion sorts, implement two quicksort versions, and compare performance using preallocated arrays, vector swaps, and native sort benchmarks.
Compute the probability that exactly one of several independent events occurs using r, employing vectorized operations like cume prod and cume sum.
Set OperationsDescription
Performs set union, intersection, (asymmetric!) difference, equality and membership on two vectors.
Usageunion(x, y) intersect(x, y) setdiff(x, y) setequal(x, y) is.element(el, set)Arguments
x, y, el, setvectors (of the same mode) containing a sequence of items (conceptually) with no duplicated values.Details Each of union, intersect, setdiff and setequal will discard any duplicated values in the arguments, and they apply as.vector to their arguments (and so in particular coerce factors to character vectors).
is.element(x, y) is identical to x %in% y.
Learn to simulate roulette in R by sampling ±5 with an 18/38 red probability, running 10,000 simulations of 20 bets, and analyzing cumulative winnings and a frequency distribution.
The Comprehensive Programming in R Course is actually a combination of two R programming courses that together comprise a gentle, yet thorough introduction to the practice of general-purpose application development in the R environment. The original first course (Sections 1-8) consists of approximately 12 hours of video content and provides extensive example-based instruction on details for programming R data structures. The original second course (Sections 9-14), an additional 12 hours of video content, provides a comprehensive overview on the most important conceptual topics for writing efficient programs to execute in the unique R environment. Participants in this comprehensive course may already be skilled programmers (in other languages) or they may be complete novices to R programming or to programming in general, but their common objective is to write R applications for diverse domains and purposes. No statistical knowledge is necessary. These two courses, combined into one course here on Udemy, together comprise a thorough introduction to using the R environment and language for general-purpose application development.
The Comprehensive Programming in R Course (Sections 1-8) presents an detailed, in-depth overview of the R programming environment and of the nature and programming implications of basic R objects in the form of vectors, matrices, dataframes and lists. The Comprehensive Programming in R Course (Sections 9-14) then applies this understanding of these basic R object structures to instruct with respect to programming the structures; performing mathematical modeling and simulations; the specifics of object-oriented programming in R; input and output; string manipulation; and performance enhancement for computation speed and to optimize computer memory resources.