Browse

Please confirm that you want to add **Essential Fundamentals of R** to your Wishlist.

Data Types and Structures in R , Inputting & Outputting Data, Writing User-Defined Functions, and Manipulating Data Sets

1,021 students enrolled

Current price: $10
Original price: $40
Discount:
75% off

30-Day Money-Back Guarantee

- 10.5 hours on-demand video
- Full lifetime access
- Access on mobile and TV

- Certificate of Completion

What Will I Learn?

- Install R and RStudio and engage in a basic R session
- Understand the characteristics of different data types and structures in R
- Be able to read in data and write out data files from various sources
- Sort, select, filter, subset, and manipulate tables of data in R
- Create and execute their own user-defined functions in an R session
- Understand how to use the apply() family of functions to execute various actions against different R data structures
- Know how to use reshaping and recoding "short cuts" for changing data types and for rearranging data structures.

Requirements

- Students will need to install both R software and RStudio (instructions are provided)

Description

Essential Fundamentals of R is an integrated program that draws from a variety of introductory topics and courses to provide participants with a solid base of knowledge with which to use R software for any intended purpose. No statistical knowledge, programming knowledge, or experience with R software is necessary. Essential Fundamentals of R (7 sessions) covers those important introductory topics basic to using R functions and data objects for any purpose: installing R and RStudio; interactive versus batch use of R; reading data and datasets into R; essentials of scripting; getting help in R; primitive data types; important data structures; using functions in R; writing user-defined functions; the 'apply' family of functions in R; data set manipulation: and subsetting, and row and column selection. Most sessions present "hands-on" material that demonstrate the execution of R 'scripts' (sets of commands) and utilize many extended examples of R functions, applications, and packages for a variety of common purposes. RStudio, a popular, open source Integrated Development Environment (IDE) for developing and using R applications, is also utilized in the program, supplemented with R-based direct scripts (e.g. 'command-line prompts') when necessary.

Who is the target audience?

- Anyone who is interested in learning to use R software who is relatively new (or 'brand new') to using R
- People who wish to learn the essential fundamentals of using R including data types and structures, inputting and outputting data and files, writing user-defined functions, and manipulating data sets
- College undergrads and/or graduate students who are looking for an alternative to using SAS or SPSS software
- Professionals engaged in quantitative analyses and/or data analyses tasks who seek an alternative to using SAS and/or SPSS software.

Students Who Viewed This Course Also Viewed

Curriculum For This Course

Expand All 46 Lectures
Collapse All 46 Lectures
10:32:31

+
–

Introduction and Orientation
8 Lectures
02:00:42

**R** is a __programming language__ and software environment for __statistical computing__ and graphics. The R language is widely used among __statisticians__ and __data miners__ for developing __statistical software__ and data analysis. Polls, __surveys of data miners__, and studies of scholarly literature databases show that R's popularity has increased substantially in recent years.

R is an implementation of the __S programming language__ combined with __lexical scoping__ semantics inspired by __Scheme__. __S__ was created by __John Chambers__ while at __Bell Labs__. There are some important differences, but much of the code written for S runs unaltered.

R was created by __Ross Ihaka__ and __Robert Gentleman__ at the __University of Auckland__, New Zealand, and is currently developed by the *R Development Core Team*, of which Chambers is a member. R is named partly after the first names of the first two R authors and partly as a play on the name of __S__.

R is a __GNU project__.__ ^{ }__The

Preview
14:56

**R** is a __programming language__ and software environment for __statistical computing__ and graphics. The R language is widely used among __statisticians__ and __data miners__ for developing __statistical software__ and data analysis. Polls, __surveys of data miners__, and studies of scholarly literature databases show that R's popularity has increased substantially in recent years.

R is an implementation of the __S programming language__ combined with __lexical scoping__ semantics inspired by __Scheme__. __S__ was created by __John Chambers__ while at __Bell Labs__. There are some important differences, but much of the code written for S runs unaltered.

R was created by __Ross Ihaka__ and __Robert Gentleman__ at the __University of Auckland__, New Zealand, and is currently developed by the *R Development Core Team*, of which Chambers is a member. R is named partly after the first names of the first two R authors and partly as a play on the name of __S__.

R is a __GNU project__.__ ^{ }__The

Preview
14:24

The R Environment consists of all the files necessary for running the R Program as well as data sets and other objects that you have created or loaded into your Workspace. These files can be broken down into three basic types:

1. The base packages that run all the standard analyses that we use in this course. These files are installed automatically when you first download and install the R program.

2. Additional packages you can install on your own and which allow for more advanced statistical analysis or additional commands.

3. The data sets that you download and other objects (data sets and variables) that you create.

Workspace Management Controls

15:37

Workspace Management R Manuals

12:58

Hands-On Tutorial of R Basics (part 1)

14:35

Hands-On Tutorial of R Basics (part 2)

14:53

Tutorial with R Functions

13:54

**R Functions for Probability Distributions**

Every distribution that R handles has four functions. There is a root name, for example, the root name for the normal distribution is `norm`

. This root is prefixed by one of the letters

`p`

for "probability", the cumulative distribution function (c. d. f.)`q`

for "quantile", the inverse c. d. f.`d`

for "density", the density function (p. f. or p. d. f.)`r`

for "random", a random variable having the specified distribution

`pnorm`

, `qnorm`

, `dnorm`

, and `rnorm`

. For the binomial distribution, these functions are `pbinom`

, `qbinom`

, `dbinom`

, and `rbinom`

. And so forth. For a *continuous* distribution (like the normal), the most useful functions for doing problems involving probability calculations are the "`p`

" and "`q`

" functions (c. d. f. and inverse c. d. f.), because the the density (p. d. f.) calculated by the "`d`

" function can only be used to calculate probabilities via integrals and R doesn't do integrals.

For a *discrete* distribution (like the binomial), the "`d`

" function calculates the density (p. f.), which in this case is a probability

and hence is useful in calculating probabilities.f(x) =P(X=x)

Distributional Functions and Plotting

19:25

+
–

Input and Output, Data and Data Structures
9 Lectures
01:45:35

Data Input and Output

14:44

Accessing Data Sets in R

14:40

Basic Data Structures (part 1)

14:47

Basic Data Structures (part 3)

14:35

A data frame is used for storing data tables. It is a list of vectors of equal length. For example, the following variable df is a data frame containing three vectors n, s, b.

> n = c(2, 3, 5)

> s = c("aa", "bb", "cc")

> b = c(TRUE, FALSE, TRUE)

> df = data.frame(n, s, b) # df is a data frame

We use built-in data frames in R for our tutorials. For example, here is a built-in data frame in R, called mtcars.

> mtcars

mpg cyl disp hp drat wt ...

Mazda RX4 21.0 6 160 110 3.90 2.62 ...

Mazda RX4 Wag 21.0 6 160 110 3.90 2.88 ...

Datsun 710 22.8 4 108 93 3.85 2.32 ...

............

The top line of the table, called the header, contains the column names. Each horizontal line afterward denotes a data row, which begins with the name of the row, and then followed by the actual data. Each data member of a row is called a cell.

To retrieve data in a cell, we would enter its row and column coordinates in the single square bracket "[]" operator. The two coordinates are separated by a comma. In other words, the coordinates begins with row position, then followed by a comma, and ends with the column position. The order is important.

Here is the cell value from the first row, second column of mtcars.

> mtcars[1, 2]

[1] 6

Moreover, we can use the row and column names instead of the numeric coordinates.

> mtcars["Mazda RX4", "cyl"]

[1] 6

Lastly, the number of data rows in the data frame is given by the nrow function.

> nrow(mtcars) # number of data rows

[1] 32

And the number of columns of a data frame is given by the ncol function.

> ncol(mtcars) # number of columns

[1] 11

Further details of the mtcars data set is available in the R documentation.

> help(mtcars)

Manipulating Dataframes (part 1)

16:04

One of the most important aspects of computing with data is the ability to manipulate it, to enable subsequent analysis and visualization. R offers a wide range of tools for this purpose.

Manipulating Dataframes (part 2)

10:48

Input Output Exercises

01:04

Dataframe Manipulation Exercises

04:11

+
–

Manipulating Dataframes in Depth
6 Lectures
01:21:47

Input Output Exercises Solution

14:21

Data Manipulation Exercise Solution

14:25

Manipulating Dataframes (part 4)

14:24

Manipulating Dataframes (part 5)

18:37

Manipulating Dataframes (part 6)

12:12

+
–

User-Defined Functions in R
6 Lectures
01:25:51

Remaining Data Manipulation Exercises Solutions

14:58

User-Defined Function Exercise and Finish Manipulating Dataframes

15:43

One of the great strengths of **R** is the user's ability to add functions. In fact, many of the functions in **R** are actually functions of functions. The structure of a function is given below.

`myfunction <- function(<em>arg1, arg2, ... </em>){<br> <em>statements</em><br> return(<em>object</em>)<br> }`

Begin User-Defined Functions Demonstrations

14:06

Objects in the function are local to the function. The object returned can be any __data type__.

The 'Scope' of a Function

14:16

Flexible Arguments to Functions

12:06

+
–

Writing Functions in R
6 Lectures
01:27:13

User-Defined Functions Exercise Solution

13:22

More on User-Defined Functions

14:31

The classic, Fortran-like loop is available in R. The syntax is a little different, but the idea is identical; you request that an index, *i*, takes on a sequence of values, and that one or more lines of commands are executed as many times as there are different values of *i*. Here is a loop executed five times with the values of *i* from 1 to 5: we print the square of each value:

` `**for (i in 1:5) print(i∧2)** [1] 1 [1] 4 [1] 9 [1] 16 [1] 25

Loops and Repeats

15:13

Control Statements

16:05

The purpose of the R function `function()`

is to create functions. For instance, consider this code:

inc <- function(x) return(x+1)

It instructs R to create a function that adds 1 to its argument and then assigns that function to `inc`

. However, that last step—the assignment—is not always taken. We can simply use the function object created by our call to `function()`

without naming that object. The functions in that context are called *anonymous*, since they have no name. (That is somewhat misleading, since even nonanonymous functions only have a name in the sense that a variable is pointing to them.)

Anonymous Functions

12:33

+
–

The Apply Family of Functions
6 Lectures
01:25:10

Some Short Programs in R (part 1)

15:13

Some Short Programs in R (part 2)

16:02

"Apply" functions keep you from having to write loops to perform some operation on every row or every column of a __ matrix__ or __data frame__, or on every element in a __list__. For example, the built-in data set state.x77 contains eight columns of data describing the 50 U.S. states in 1977. If you wanted the average of each of the eight columns, you could do this:

> avgs <- numeric (8) > for (i in 1:8) + avgs[i] <- mean (state.x77[,i]) # The "+" is R's continuation character; don't type it > avgs [1] 4246.4200 4435.8000 1.1700 70.8786 7.3780 53.1080 104.4600 70735.8800This is comparatively slow, much more so in large datasets. R is bad at looping. A more vectorized way to do this is to use the apply() function. In this example, apply extracts each column as a vector, one at a time, and passes it to the median() function.

> apply (state.x77, 2, median) Population Income Illiteracy Life Exp Murder HS Grad Frost Area 2838.5 4519 0.95 70.675 6.85 53.25 114.5 54277The 2 means "go by column" -- a 1 would have meant "go by row." Of course, if we had used a 1, we would have computed 50 averages, one for each row. If we had had a

The Apply family of Functions (part 1)

16:11

The Apply Family of Functions (part 3)

11:47

Apply Functions Exercises

08:22

+
–

Reshaping and Recoding Data
5 Lectures
01:06:13

Apply Functions Exercises Solutions

14:24

The reshape package in R l

The Reshape Package in R

12:04

Recoding data allows you to change a data type, for example, or to calculate new data columns from existing ones.

Recoding Data in R (part 1)

16:57

Preview
15:50

More Vector-Maker Exercises

06:58

About the Instructor

Professor of Information Systems

- About Us
- Udemy for Business
- Become an Instructor
- Affiliates
- Blog
- Topics
- Mobile Apps
- Support
- Careers
- Resources

- Copyright © 2017 Udemy, Inc.
- Terms
- Privacy Policy
- Intellectual Property