
I regularly get questions from people who want to take SAS base certification about:
1. proper materials to prepare
2. how long it takes to prepare for the exam
3. whether prior experience with SAS is required
This course will offer a low stress approach to learning SAS and prepare for the exam.
It is built with learning by examples in mind and organized around the following themes:
a. Step 1: How to read data into SAS (regardless of where it comes from)
b. Step 2: How to explore data in SAS (review, prepare, understand)
c. Step 3: How to transform data in SAS (new variables, merge, concatenate, subset, transpose…)
d. Step 4: Understanding: Behind the scene in SAS and error handling (PROGRAM DATA VECTOR)
e. Step 5: Building reports (we could have focused on statistics or visualizations)
f. Step 6: Exporting reports and data outside SAS
Any code can be broken down into a combination of the following elements:
1. DATA STEP
• Mostly used for data manipulation
2. PROC STEPS: ex: PROC PRINT, PROC FREQ, PROC MEANS…
Use for:
•Reporting
•Graphics
•Statistical analyses
3. PROC SQL
•Alternative to DATA STEP and PROC STEP for relational databases. Not covered in this course
4. MACROS
•This is SAS extension. It allows for building code that is more reusable
3. GLOBAL STATEMENTS and GLOBAL OPTIONS
•Global statements and options are additional elements in SAS code that do not depend on any one of the components mentioned above
DATA STEP is used to perform among other things the following tasks:
reading SAS data sets (extension: .sas7bdat)
reading external files including Raw data (txt, dat, csv) or proprietary sources (EXCEL, SPSS)
subsetting data sets: that is trimming the data set size either by dropping some columns (keeping fewer columns) or filtering data (dropping some rows)
Combining data sets: Merging or concatenating
Transforming data: adding new variables, changing values, recoding variables
Exporting data to external sources
The goal in this Use case is to:
create a basic research data set by combining two other data sets: one demographics information (demo1) and another containing income information (demo2)
allows to see at a basic level how SAS components can be used together to answer a real life data management question
Type of data manipulations that will be covered include:
cleaning data
formatting data
labelling data
merging data
creating reports/statistics
This lecture walks you through the basic steps of using DATA STEP in SAS code
This lecture illustrates how DATA STEP can be used to answer real life question
PROC STEPS or PROCEDURES are used for:
Building reports: proc report, proc tabulate; proc print
Creating graphics: PROC SGPLOT, PROC GPLOT
Performing statistical analyses: PROC REG, PROC GLM…
Reviewing data quality: PROC PRINT, PROC FREQ, PROC MEANS
And many other tasks
PROC STEPS are the most common in SAS code but are the least difficult to learn
This lecture illustrates the use of PROC STEPS in SAS code
This lecture shows examples of how to use PROC STEPS to answer real life questions
Macros in SAS are used to expand SAS capabilities, more specifically, they are used for:
Handling repetitive code
Inserting complex logic in SAS code and make it more generic
There are 4 main ways of creating macros (that is macro variables or macros):
%let statement
CALL SYMPUT
PROC SQL, SELECT INTO:
%MACRO……..%MEND;
Only %LET STATEMENT is covered in this course
This lecture illustrate how and why to implement %LET STATEMENT in SAS
This lecture shows examples of how %LET STATEMENT can be used to answer real life questions
Global statements and global options are additional elements in SAS not classifiable elsewhere
Examples of global statements include:
LIBNAME: used create libraries (for accessing SAS or some other data sets)
Title or Footnote: used to provide descriptive context to reports
ODS and ODS close: used to send outputs to destination reports: PDF, EXCEL, WORD...
Examples of global options:
OBS=...: Used to limit records to print from a DATA STEP or PROC STEP
DATE/NODATE: used to decide whether to print the current date on a report or not
SAS Components: Putting all together
This lecture confirms that whatever your data management question is, the SAS code you need to answer it is always a combination of the components we covered in this section
The following will be covered:
STEPS: the core components in SAS
STATEMENTS and options: the elements that make up steps and other components
SAS Naming rules: data set, variable, library
Commenting code in SAS: one liner vs multiple lines comments
A step is either a DATA STEP or PROC STEP
(1) A DATA step starts with the keyword DATA and usually ends with RUN
(2) A PROC STEP starts with the keyword PROC and usually ends with RUN
Examples of what statements do:
drop or keep variables: DROP/KEEP
filter records: WHERE statement or IF statement
create new variables
Examples of what options do:
Drop or keep variables: (DROP=) or (KEEP=)
OPTION OBS=n
Option NODATE/DATE
How to:
name data sets or variables
name libraries
use default naming rules vs extended rules.
How to comment code in SAS:
Use one liner comment: asterisk followed by comment followed by semicolon
Use multiple liners comment: slash followed by asterisk followed by comments followed by asterisk followed by slash
Why to comment code in SAS:
to provide context about what the code is doing
to help with understanding of the code by other users
to ensure that SAS does not attempt to interpret comments
3 main ways of reading data into SAS:
Using library/Engine option
Using INPUT statement (used for non SAS data sets)
Using PROC IMPORT
More about:
Library/engine option
INPUT STATEMENT option
PROC IMPORT option
Ways for reading data into SAS:
using LIBNAME with engine V9 for SAS DATA SETS
Using LIBNAME with XLSX engine for excel files
Using PROC IMPORT for various files types: excel, txt, csv, dat...
Using INFILE/INPUT or INPUT/INFILE
Reading SAS DATA SETS (extension: .sas7bdat) using LIBRARIES:
example: LIBNAME ABC V9 "path to file";
ABC is the library name
path to file: is the folder path to where the SAS data set is located
V9: the engine is optional
Reading data using excel libraries:
example: libname ABC xlsx "path to excel file\excelfile.xlsx";
ABC is the library name
path to excel file is the folder path to the excel file
excelfile.xlsx is the target file
Each sheet in the file will represent a new SAS DATA SET
XLSX is the engine. It is required
Reading files of types:
CSV
EXCEL
TEXT, DAT
Illustration of how to use PROC IMPORT to read data into SAS
INPUT STATEMENT can be used to read any non SAS data into SAS
It is the tool of last resolve when other options fail
it uses INFILE to access data when the data is saved in external files or
DATALINES OR CARDS when data is captured in the code editor
it does depend on user declaring variables and their attributes (type, length, informat...).
Various input styles are used depending on how the data was formatted or prepared in the source
LIST INPUT: data columns are separated by a common delimiter
COLUMN INPUT: data is captured in well pre-defined columns easily separable
FORMAT INPUT: customed data captured that requires user to control the pointer to read different portions of data
The following procedures are essential for data exploration:
PROC CONTENTS: for reviewing variables and their attributes
PROC PRINT: for printing any portion of data
PROC FREQ: for creating a summary count of data per variable
PROC MEANS: for creating summary statistics on continuous variables
PROC CONTENTS is the best tool used to learn about the variables in data
if variables need description:
use LABEL STATEMENT
if values in a variable needs description
use FORMAT statement
Although SAS has a lot of formats integrated (ex: Date formats), once in a while a user-defined format is needed:
example: if GENDER is captured as 1 and 2 (where 1 corresponds to female and 2 to male), a user-defined format can be used to capture this as Female and Male in reports, statistics and graphics
VALUE STATEMENT is one way to do this
Value statement option is more appropriate when the list of values to format is small and not prone to changes
a PROC FORMAT is used.
A second way to create user-defined formats is CNTLIN=option
while VALUE STATEMENT OPTION requires that user manually matches the values to format to the labels to use as descriptions,
CNTLIN=option takes all of the information needed (values, labels, format name) from a data set
CNTLIN=option is more appropriate when the list of values to format is large or changes more often
Some additional options under PROC CONTENTS:
Printing variables in the same order as in the data set: VARNUM
Printing only the variables list: SHORT
Outputting the variables list to a new data set: OUT=
PROC PRINT is used as an alternative to opening the data set
VAR statement is used to control which variables print
(FIRSTOBS=)/(OBS=) and WHERE are used to control which observations print
This lecture will explore the essential options under PROC PRINT
PROC FREQ allows for learning about participants in the data, one variable at a time such as:
how many participants carry a given characteristic of the sample of a variable
what percentage of the sample carries such a characteristics on the given variable
the ability to cross-tabulate two or more variables
This lecture offers the opportunity to explore essential options under PROC FREQ procedure
PROC MEANS is designed to:
build statistics from numeric variables; statistics that include the mean, median, standard deviation...
breakdown such statistics by the values of a categorical variable
PROC MEANS should not be run on numeric variables that do not a numeric meaning
This lecture provides an overview of the use of PROC MEANS
Combining data can consist of :
concatenating or vertical combining: stacking 2 or more data sets on top of one another
Merging or horizontal combining: more often matching records by a common matching variable or ID
This lecture provides a high level description of DATA CONCATENATING and the requirements for successful combining
This lecture offers a description of what MERGING is about what requirements must be met to succeed
While combining data consisted of bringing different data sources into one, data subsetting is designed to reduced data size.
It can be done by reducing the columns in the data or by reducing the rows
Subsetting by columns is done by either dropping columns not needed or keep columns of interest
Subsetting data consists of reducing the number of rows:
by using OBS= or its variants
WHERE or IF conditions in DATA STEPs
using WHERE in procedures
Examples of data transformation include:
a. Create variables
using assignment statement
Using length statement
using functions
using conditional statements
b. Recoding variables
using conditional statements
using INPUT and PUT functions along with a format
c. Transpose data
using PROC TRANSPOSE
DATA STEP or + arrays
d. Formatting data
The use of variable assignment statement and/or length statement for creating new variables
The use of conditional statements:
IF THEN
IF THEN DO
How to create user-defined formats:
The use of the VALUE STATEMENT
The use of the CNTLIN=OPTION
Data accumulation:
by using SUM STATEMENT or SUM function and RETAIN
by using FIRST.VARIABLE and/or LAST.VARIABLE to break down by categories
Transposing data:
long to wide format or vice versa
Using PROC TRANSPOSE
an alternative is to use DATA STEP and arrays
Converting data from numeric to character or vice versa using:
INPUT or PUT functions and
A FORMAT
In this course, you will learn the most essential notions that are covered under SAS Base Certification. The base certification is organized into two exams: (1) Programming Fundamentals Using SAS 9.4 and (2) Base Programming Using SAS 9.4. SAS certification exams have evolved in recent years to focus more on testing students ability to implement the notions learned. Typically, example data sets are given, questions are asked; then students are expected to answer those questions by manipulating the data with limited instructions. In the past, exams were mostly memory-based; they were designed to test students ability to remember the syntax and the rules of programming language.
We designed this course to focus more on comprehension over memory; specifically, for each notion covered, we try to answer the "WHAT"--what is it used for, "HOW"-- how is it used, "WHY"-- for each type of question, options available, strengths and, limitations, and "WHEN"--at what stage of data management stage is it used. The course is organized along the line of the expected steps a data manager or analyst needs to take to complete his/her task: (1) How to read the data? what tools are available for that purpose depending on the data source; (2) How to prepare the data prior analysis; (3) data management/analysis step. In short, for each essential notion we try to ensure that you have a contextual understanding of it: how to and when to use it. We also included a lot practice exercises, before and after each lesson.