Extra Fundamentals of R
4.2 (10 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
936 students enrolled
Wishlisted Wishlist

Please confirm that you want to add Extra Fundamentals of R to your Wishlist.

Add to Wishlist

Extra Fundamentals of R

Understanding R graphics, how to set up "real-world" simulations, and how to process non-numeric character and text in R
4.2 (10 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
936 students enrolled
Last updated 8/2015
English
Current price: $10 Original price: $40 Discount: 75% off
1 day left at this price!
30-Day Money-Back Guarantee
Includes:
  • 12.5 hours on-demand video
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
What Will I Learn?
  • Understand and use the base, lattice and ggplot graphics systems in R.
  • Be able to simulate many 'real-world' and practical "what-if" scenarios to determine likely outcomes.
  • Have a though understanding of, and ability to effectively use, the text and string variable processing capabilities in R.
  • Know how to use and implement R's text-based "regular expression" features and functions.
View Curriculum
Requirements
  • Students will need to install the no-cost R console and the no-cost RStudio application (instructions and provided).
Description

Extra Fundamentals of R is an extension of the Udemy course Essential Fundamentals of R. Extra Fundamentals of R introduces additional topics of interest and relevance utilizing many specific R-scripted examples. These broad topics include:

(1) Details on using Base, GGPlot and Lattice graphics;

(2) An introduction to programming and simulation in R; and

(3) Character and string processing in R.

All materials, scripts, slides, documentation and anything used or viewed in any one of the video lessons is provided with the course. The course is useful for both R-novices, as well as to intermediate R users. Rather than focus on specific and narrow R-supported skill sets, the course paints a broad canvas illustrating many specific examples in three domains that any R user would find useful. The course is a natural extension of the more basic Udemy course, Essential Fundamentals of R and is highly recommended for those students, as well as for other new students (and for practicing professionals) interested in the three domains enumerated above.

Base, GGplot and Lattice (or "trellis) graphics are the three principal graphics systems in R. They each operate under different "rules" and each present useful and often brilliant graphics displays. However, each of these three graphics systems are generally designed and used for different domains or applications.

There are many different programming and simulation scenarios that can be modeled with R. This course provides a good sense for some of the potential simulation applications through the presentation of 'down-to-earth,' practical domains or tasks that are supported. The examples are based on common and interesting 'real-world' tasks: (1) simulating a game of coin-tossing; (2) returning Top-Hats checked into a restaurant to their rightful owners; (3) collecting baseball cards and state quarters for profit: (4) validating whether so-called "streaky" behavior, such as have a string of good-hitting behavior in consecutive baseball games, is really unusual from a statistical point of view; (4) estimating the number of taxicabs in a newly-visited city; and (5) estimating arrival times for Sam and Annie at the Empire State Building ("Sleepless in Seattle").

R is likely best known for the ability to process numerical data, but R also has quite extensive capabilities to process non-quantitative text (or character) and string variables. R also has very good facilities for implementing powerful "regular expression" natural-language functions. An R user is bested served with an understanding of how these text (or character) and string processing capabilities "work."

Most sessions present "hands-on" material that make use of many extended examples of R functions, applications, and packages for a variety of common purposes. RStudio, a popular, open source Integrated Development Environment (IDE) for developing and using R applications, is utilized in the program, supplemented with R-based direct scripts (e.g. 'command-line prompts') when necessary.

Who is the target audience?
  • Any novice or intermediate R user would benefit from this course.
  • Appropriate candidate students for this course include undergraduate and graduate students, college and university faculty, and practicing professionals, particularly in quantitative or analytics fields.
  • It is useful to have some rudimentary exposure to using R in a sample session, executing R script.
Students Who Viewed This Course Also Viewed
Curriculum For This Course
66 Lectures
12:20:58
+
Base and GGPlot2 Graphics in R
6 Lectures 57:32

One of the greatest strengths of the R language is surely the base graphics capabilities. Grid graphics, lattice, ggplot2, and the many R packages that interface with javascript D3graphics have added astounding capabilities, well beyond what can be achieved with base graphics alone. Nevertheless, the quick, one line, base graphics plots ( like plot() ) are a tremendous aid to data exploration, and are responsible for a good bit of the "flow" of an R session.

Preview 15:15

ggplot2 is a plotting system for R, based on the grammar of graphics, which tries to take the good parts of base and lattice graphics and none of the bad parts. It takes care of many of the fiddly details that make plotting a hassle (like drawing legends) as well as providing a powerful model of graphics that makes it easy to produce complex multi-layered graphics.

Continue Graphics Capabilities and Comparisons
07:25

Graphical Parameters

You can customize many features of your graphs (fonts, colors, axes, titles) through graphic options.

One way is to specify these options in through the par( ) function. If you set parameter values here, the changes will be in effect for the rest of the session or until you change them again. The format ispar(optionname=value, optionname=value, ...)

# Set a graphical parameter using par()<br> <br> par() # view current settings<br> opar <- par() # make a copy of current settings<br> par(col.lab="red") # red x and y labels <br> hist(mtcars$mpg) # create a plot with these new settings <br> par(opar) # restore original settings

A second way to specify graphical parameters is by providing the optionname=value pairs directly to a high level plotting function. In this case, the options are only in effect for that specific graph.

# Set a graphical parameter within the plotting function <br> hist(mtcars$mpg, col.lab="red")

See the help for a specific high level plotting function (e.g. plot, hist, boxplot) to determine which graphical parameters can be set this way.

More Graphics Capabilities and Comparisons
15:05

Adding Text to Graphics
07:09

Mathematical and Drawing Functions
11:03
+
Finish Base Graphics Capabilities, Begin Lattice Graphics
6 Lectures 01:22:42
Fitting Non-Linear Curves in Base
15:49


Boxplots

Boxplots can be created for individual variables or for variables by group. The format is boxplot(x, data=), where x is a formula and data= denotes the data frame providing the data. An example of aformula is y~group where a separate boxplot for numeric variable y is generated for each value of group. Add varwidth=TRUE to make boxplot widths proportional to the square root of the samples sizes. Addhorizontal=TRUE to reverse the axis orientation.

# Boxplot of MPG by Car Cylinders

boxplot(mpg~cyl,data=mtcars, main="Car Milage Data", xlab="Number of Cylinders", ylab="Miles Per Gallon")


Base Boxplots and Bargraphs
14:32

Lattice Graphs

The lattice package, written by Deepayan Sarkar, attempts to improve on base R graphics by providing better defaults and the ability to easily display multivariate relationships. In particular, the package supports the creation of trellis graphs - graphs that display a variable or the relationship between variables, conditioned on one or more other variables.

The typical format is

<em>graph_type</em>(<em>formula</em>, data=)

where graph_type is selected from the listed below. formula specifies the variable(s) to display and any conditioning variables . For example ~x|A means display numeric variable x for each level of factor A.y~x | A*B means display the relationship between numeric variables y and x separately for every combination of factor A and B levels. ~x means display numeric variable x alone.

Introduction to Lattice Graphics
13:30

Superposition and Lattice Exercise
04:26

Lattice Exercise Solution
15:40
+
Lattice and GGPlot Graphics
6 Lectures 01:29:39
"In Living Color" Exercises Solution Explained
15:45

Finish "In Living Color" Exercise and Begin Lattice Graphics
15:18


Plotting the Titanic Data Set and Begin GGPlot Graphics
11:54

GGPlot: Non-Linear Fits and Plots
15:31

Histograms, Bar Charts and Density Plots
16:59
+
Programming and Simulation 1
6 Lectures 01:27:01
Cuckoohost and Other Plots
14:51

Finish Cuckoohosts and Begin Simulation
16:07

Simulation uses methods based on random numbers to simulate a process of interest on the computer. The goal is to learn important statistical and/or practical information about the process. In statistics, simulations can be used to create simulated data sets in order to study the accuracy of mathematical approximations and the effect of assumptions being violated. We will study properties of some quantities that can be calculated from a set of data which are a random draw from a population. Random numbers form a basic tool for any simulation study. Simulations require the ability to generate random numbers. On a computer, it is only possible to generate 'pseudo-random' numbers which for practical purposes behave as if they were drawn randomly.

Preview 14:10

Simulating a Coin Tossing Game of Chance (part 2)
14:05

Simulating the Return of Top-Hats to Rightful Owners (part 1)
15:57

Finish Simulating Top-Hat Returns and Begin Collecting Baseball Cards for Profit
11:51
+
Programming and Simulation 2
9 Lectures 01:30:17

Collecting Baseball Cards (part 2)
12:42

Collecting Baseball Cards (part 3)
10:30

Collecting Quarters Exercise Solution
06:06

Streaky Baseball Behavior (part 1)
11:52

Streaky Baseball Behavior (part 2)
12:23

Sam and Annie Arrive at the Empire State Building (part 1)
06:36

Hats and Streakiness Exercise
07:30

Sam and Annie Arrive at the Empire State Building (part 2)
11:14
+
Programming and Simulation 3
13 Lectures 02:00:19
Checking Hats Exercise Solution
06:43

More Streakiness Exercise Solution
11:02


Estimating Mean Squared Error of a Trimmed Mean
08:03

Estimating a Confidence Level
11:37

Empirical Confidence Level
04:57

Estimating the Taxi Population (part 1)
09:11

Estimating the Taxi Population (part 2)
07:32

Permutation Tests (part 1)
13:30

Permutation Tests (part 2)
16:55

The Bootstrap and Jackknife (part 1)
09:32

The Bootstrap and Jackknife (part 2)
11:37

Late to Class Again Exercise
03:01
+
Character Manipulation and String Processing
10 Lectures 01:44:24
Late to Class Exercise Solution
09:59

Handling and processing text strings in R? Wait a second . . . you exclaim, R is not a scripting language like Perl, Python, or Ruby. Why would you want to use R for handling and processing text? Well, because sooner or later (I would say sooner than later) you will have to deal with some kind of string manipulation for your data analysis. So it's better to be prepared for such tasks and know how to perform them inside the R environment.

Character and String Manipulation
08:39

Another very useful function is cat() which allows us to concatenate objects and print them either on screen or to a file. Its usage has the following structure:

cat(..., file = "", sep = " ", fill = FALSE, labels = NULL, append = FALSE)

Preview 10:15

Displaying and Concatenating Strings (part2)
14:00

Manipulating Parts of a String
10:00

Breaking Apart Character Values
14:13

A regular expression (a.k.a. regex) is a special text string for describing a certain amount of text. This “certain amount of text” receives the formal name of pattern. Hence we say that a regular expression is a pattern that describes a set of strings.

What are Regular Expressions? (slides)
10:27

There are two main aspects that we need to consider about regular expressions in R. One has to do with the functions designed for regex pattern matching. The other aspect has to do with the way regex patterns are expressed in R. In this section we are going to talk about the latter issue: the way R works with regular expressions. Some find more convenient to first cover the specificities of R around regex operations, before discussing the functions and how to interact with regex patterns.

Using Regular Expressions in R (part 1)
13:31

Using Regular Expressions in R (part 2)
11:36

Reversing a String Exercise
01:44
+
More Text and String Processing
10 Lectures 01:49:04
Reverse String Exercise Solution
09:29

To find exactly where the pattern is found in a given string, we can use the regexpr() function. This function returns more detailed information than grep() providing us: a) which elements of the text vector actually contain the regex pattern, and b) identifies the position of the substring that is matched by the regular expression pattern.

# some text

text = c("one word", "a sentence", "you and me", "three two one")

# default usage

regexpr("one", text)

Preview 12:41

The function gregexpr() does practically the same thing as regexpr(): identify where a pattern is within a string vector, by searching each element separately. The only difference is that gregexpr() has an output in the form of a list. In other words, gregexpr() returns a list of the same length as text, each element of which is of the same form as the return value for regexpr(), except that the starting positions of every (disjoint) match are given.

# some text

text = c("one word", "a sentence", "you and me", "three two one")

# pattern

pat = "one"

# default usage

gregexpr(pat, text)

The Regexpr() and Gregexpr() Functions (part 2)
09:48

Testing a Filename for a Suffix
08:40

Forming Filenames
08:46

Substituting Text and Tagging Text
14:54

grep() is perhaps the most basic functions that allows us to match a pattern in a string vector. The first argument in grep() is a regular expression that specifies the pattern to match. The second argument is a character vector with the text strings on which to search. The output is the indices of the elements of the text vector for which there is a match. If no matches are found, the output is an empty integer vector.

# some text

text = c("one word", "a sentence", "you and me", "three two one")

# pattern

pat = "one"

# default usage

grep(pat, text)

Finding Words in Text Passages
14:45

Manipulating the Component Names of List Structures
11:07

Sorting and Ordering Words
12:21

Determining and Plotting Word Frequency
06:33
About the Instructor
Geoffrey Hubona, Ph.D.
4.0 Average rating
1,288 Reviews
11,361 Students
28 Courses
Professor of Information Systems

Dr. Geoffrey Hubona held full-time tenure-track, and tenured, assistant and associate professor faculty positions at 3 major state universities in the Eastern United States from 1993-2010. In these positions, he taught dozens of various statistics, business information systems, and computer science courses to undergraduate, master's and Ph.D. students. He earned a Ph.D. in Business Administration (Information Systems and Computer Science) from the University of South Florida (USF) in Tampa, FL (1993); an MA in Economics (1990), also from USF; an MBA in Finance (1979) from George Mason University in Fairfax, VA; and a BA in Psychology (1972) from the University of Virginia in Charlottesville, VA. He was a full-time assistant professor at the University of Maryland Baltimore County (1993-1996) in Catonsville, MD; a tenured associate professor in the department of Information Systems in the Business College at Virginia Commonwealth University (1996-2001) in Richmond, VA; and an associate professor in the CIS department of the Robinson College of Business at Georgia State University (2001-2010). He is the founder of the Georgia R School (2010-2014) and of R-Courseware (2014-Present), online educational organizations that teach research methods and quantitative analysis techniques. These research methods techniques include linear and non-linear modeling, multivariate methods, data mining, programming and simulation, and structural equation modeling and partial least squares (PLS) path modeling. Dr. Hubona is an expert of the analytical, open-source R software suite and of various PLS path modeling software packages, including SmartPLS. He has published dozens of research articles that explain and use these techniques for the analysis of data, and, with software co-development partner Dean Lim, has created a popular cloud-based PLS software application, PLS-GUI.