The Comprehensive Programming in R Course is actually a combination of two R programming courses that together comprise a gentle, yet thorough introduction to the practice of general-purpose application development in the R environment. The original first course (Sections 1-8) consists of approximately 12 hours of video content and provides extensive example-based instruction on details for programming R data structures. The original second course (Sections 9-14), an additional 12 hours of video content, provides a comprehensive overview on the most important conceptual topics for writing efficient programs to execute in the unique R environment. Participants in this comprehensive course may already be skilled programmers (in other languages) or they may be complete novices to R programming or to programming in general, but their common objective is to write R applications for diverse domains and purposes. No statistical knowledge is necessary. These two courses, combined into one course here on Udemy, together comprise a thorough introduction to using the R environment and language for general-purpose application development.
The Comprehensive Programming in R Course (Sections 1-8) presents an detailed, in-depth overview of the R programming environment and of the nature and programming implications of basic R objects in the form of vectors, matrices, dataframes and lists. The Comprehensive Programming in R Course (Sections 9-14) then applies this understanding of these basic R object structures to instruct with respect to programming the structures; performing mathematical modeling and simulations; the specifics of object-oriented programming in R; input and output; string manipulation; and performance enhancement for computation speed and to optimize computer memory resources.
One of the great strengths of R is the user's ability to add functions. In fact, many of the functions in Rare actually functions of functions. The structure of a function is given below.
myfunction <- function(arg1, arg2, ... )
Objects in the function are local to the function.
A vector is a sequence of data elements of the same basic type. Members in a vector are officially called components. Nevertheless, they are often called elements.
The function matrix creates matrices.
matrix(data, nrow, ncol, byrow)
The data argument is usually a list of the elements that will fill the matrix. The nrow and ncolarguments specify the dimension of the matrix. Often only one dimension argument is needed if, for example, there are 20 elements in the data list and ncol is specified to be 4 then R will automatically calculate that there should be 5 rows and 4 columns since 4*5=20. The byrowargument specifies how the matrix is to be filled. The default value for byrow is FALSE which means that by default the matrix will be filled column by column.
seq1 <- seq(1:6)
mat1 <- matrix(seq1, 2)
mat2 <- matrix(seq1, 2, byrow = T)
A list is an R structure that may contain object of any other types, including other lists. Lots of the modeling functions (like t.test() for the t test or lm() for linear models) produce lists as their return values, but you can also construct one yourself:
mylist <- list (a = 1:5, b = "Hi There", c = function(x) x * sin(x))
A data frame is more general than a matrix, in that different columns can have different modes (numeric, character, factor, etc.). This is similar to SAS and SPSS datasets.
d <- c(1,2,3,4)<br> e <- c("red", "white", "red", NA)<br> f <- c(TRUE,TRUE,TRUE,FALSE)<br> mydata <- data.frame(d,e,f)<br> names(mydata) <- c("ID","Color","Passed") # variable names
There are a variety of ways to identify the elements of a data frame .
myframe[3:5] # columns 3,4,5 of data frame<br> myframe[c("ID","Age")] # columns ID and Age from data frame<br> myframe$X1 # variable x1 in the data frame
A data frame is a table, or two-dimensional array-like structure, in which each column contains measurements on one variable, and each row contains one case or sample (observation) with the corresponding values for each variable for that observation.
An ordered collection of objects (components). A list allows you to gather a variety of (possibly unrelated) objects under one name.
# example of a list with 4 components - <br> # a string, a numeric vector, a matrix, and a scaler <br> w <- list(name="Fred", mynumbers=a, mymatrix=y, age=5.3)<br> <br> # example of a list containing two lists <br> v <- c(list1,list2)
Identify elements of a list using the [] convention.
mylist[] # 2nd component of the list<br> mylist[["mynumbers"]] # component named mynumbers in list
Tell R that a variable is nominal by making it a factor. The factor stores the nominal values as a vector of integers in the range [ 1... k ] (where k is the number of unique values in the nominal variable), and an internal vector of character strings (the original values) mapped to these integers.
# variable gender with 20 "male" entries and <br> # 30 "female" entries <br> gender <- c(rep("male",20), rep("female", 30)) <br> gender <- factor(gender) <br> # stores gender as 20 1s and 30 2s and associates<br> # 1=female, 2=male internally (alphabetically)<br> # R now treats gender as a nominal variable <br> summary(gender)
An ordered factor is used to represent an ordinal variable.
# variable rating coded as "large", "medium", "small'<br> rating <- ordered(rating)<br> # recodes rating to 1,2,3 and associates<br> # 1=large, 2=medium, 3=small internally<br> # R now treats rating as ordinal
R will treat factors as nominal variables and ordered factors as ordinal variables in statistical proceedures and graphical analyses. You can use options in the factor( ) and ordered( ) functions to control the mapping of integers to strings (overiding the alphabetical ordering). You can also use factors to createvalue labels.
1. Creating factor variables
Factor variables are categorical variables that can be either numeric or string variables. There are a number of advantages to converting categorical variables to factor variables. Perhaps the most important advantage is that they can be used in statistical modeling where they will be implemented correctly, i.e., they will then be assigned the correct number of degrees of freedom. Factor variables are also very useful in many different types of graphics. Furthermore, storing string variables as factor variables is a more efficient use of memory. To create a factor variable we use the factor function. The only required argument is a vector of values which can be either string or numeric. Optional arguments include the levels argument, which determines the categories of the factor variable, and the default is the sorted list of all the distinct values of the data vector. The labels argument is another optional argument which is a vector of values that will be the labels of the categories in thelevels argument.
R Programming Environment and Scope
In order to write functions in a proper way and avoid unusual errors, we need to know the concept of environment and scope in R.R Programming Environment
Environment can be thought of as a collection of objects (functions, variables etc.). An environment is created when we first fire up the R interpreter. Any variable we define, is now in this environment. The top level environment available to us at the R command prompt is the global environment called
R_GlobalEnv. Global environment can be referred to as
.GlobalEnv in R codes as well. We can use the
ls() function to show what variables and functions are defined in the current environment. Moreover, we can use the
environment() function to get the current environment.
As remarked at several points in this book, the purpose of the R function
function() is to create functions. For instance, consider this code:
inc <- function(x) return(x+1)
It instructs R to create a function that adds 1 to its argument and then assigns that function to
inc. However, that last step—the assignment—is not always taken. We can simply use the function object created by our call to
function() without naming that object. The functions in that context are called anonymous, since they have no name. (That is somewhat misleading, since even nonanonymous functions only have a name in the sense that a variable is pointing to them.)
Performs set union, intersection, (asymmetric!) difference, equality and membership on two vectors.Usage
union(x, y) intersect(x, y) setdiff(x, y) setequal(x, y) is.element(el, set)Arguments
x, y, el, setvectors (of the same mode) containing a sequence of items (conceptually) with no duplicated values.Details
setequal will discard any duplicated values in the arguments, and they apply
as.vector to their arguments (and so in particular coerce factors to character vectors).
is.element(x, y) is identical to
x %in% y.
Dr. Geoffrey Hubona held full-time tenure-track, and tenured, assistant and associate professor faculty positions at 3 major state universities in the Eastern United States from 1993-2010. In these positions, he taught dozens of various statistics, business information systems, and computer science courses to undergraduate, master's and Ph.D. students. He earned a Ph.D. in Business Administration (Information Systems and Computer Science) from the University of South Florida (USF) in Tampa, FL (1993); an MA in Economics (1990), also from USF; an MBA in Finance (1979) from George Mason University in Fairfax, VA; and a BA in Psychology (1972) from the University of Virginia in Charlottesville, VA. He was a full-time assistant professor at the University of Maryland Baltimore County (1993-1996) in Catonsville, MD; a tenured associate professor in the department of Information Systems in the Business College at Virginia Commonwealth University (1996-2001) in Richmond, VA; and an associate professor in the CIS department of the Robinson College of Business at Georgia State University (2001-2010). He is the founder of the Georgia R School (2010-2014) and of R-Courseware (2014-Present), online educational organizations that teach research methods and quantitative analysis techniques. These research methods techniques include linear and non-linear modeling, multivariate methods, data mining, programming and simulation, and structural equation modeling and partial least squares (PLS) path modeling. Dr. Hubona is an expert of the analytical, open-source R software suite and of various PLS path modeling software packages, including SmartPLS. He has published dozens of research articles that explain and use these techniques for the analysis of data, and, with software co-development partner Dean Lim, has created a popular cloud-based PLS software application, PLS-GUI.