Crash Course: R Programming for Biostatistical Data Analysis

Name: Crash Course: R Programming for Biostatistical Data Analysis
Rating: 4.8 (22 reviews)

RStudio Crash Course: Learn data cleaning, visualization, and regression in R using real public health datasets

Created byMd Ahshanul Haque

Last updated 5/2026

English

What you'll learn

Import Excel/CSV/DHS-style data, clean & recode variables, handle missing values, and create analysis-ready datasets.
Create publication-ready tables and plots using dplyr, tidyr, ggplot2, and gtsummary
Run regression for epidemiology: logistic (OR), log-binomial/Poisson (RR), and interpret results with 95% CI and p-values
Export results to Word/Excel and build a reproducible workflow with scripts, folders

Course content

9 sections • 37 lectures • 2h 10m total length

Introduction to the RStudio Crash Course1:41
Welcome to the RStudio Crash Course for Biostatistics and Epidemiology. Learn what you’ll study and how this course will support your research and publications.

Setting Up RStudio and Required Packages3:25
In this lecture, you will learn how to set up RStudio and install the required packages for this course. By the end of this lecture, you will be able to open RStudio, install essential packages, and load them for use in your analysis.
Reading Excel Data in R for Data Analysis3:10
In this lecture, you will learn how to import Excel data into R using the readxl package. By the end of this lecture, you will be able to read datasets and prepare them for data analysis in R.
Trial Box Plot in R Using ggplot22:37
In this lecture, you will learn how to create a box plot using ggplot2 in R for data visualization. By the end of this lecture, you will be able to visualize data distribution and compare groups.
Trial Summary Tables in R Using gtsummary2:03
In this lecture, you will learn how to generate summary tables in R using the gtsummary package. By the end of this lecture, you will be able to create descriptive statistics tables for research and reporting.
Keeping and Dropping Variables3:10
In this lecture, you will learn how to select and remove variables using dplyr in R for data manipulation. By the end of this lecture, you will be able to clean and manage your dataset efficiently.
Creating New Variables in R2:43
In this lecture, you will learn how to create new variables using the mutate() function in dplyr. By the end of this lecture, you will be able to transform and recode variables for analysis.
Combining Multiple Steps Using the Pipe Operator (%>%)3:57
In this lecture, you will learn how to use the pipe operator (%>%) in R to combine multiple data manipulation steps. By the end of this lecture, you will be able to write cleaner and more efficient tidyverse workflows.

Excel Data Import in RStudio (readxl) | Public Health Research2:31
Learn how to import Excel (.xlsx) files into RStudio using readxl, select sheets, and create an analysis-ready dataset for public health, epidemiology, and biostatistics.
Prepare Factor Variables in RStudio3:26
This lecture shows how to prepare factor variables for biostatistics and epidemiology analysis in RStudio.
Prepare Numeric Variables in RStudio | as.numeric1:27
In this lecture, you will learn how to prepare numeric variables in RStudio for real research analysis. We will convert variables to numeric correctly, handle common problems like characters, factors, commas, and special symbols, and create clean values with missing (NA) where needed. You will also check ranges, outliers, and summary statistics so your dataset is ready for regression and publication tables.
Short Descriptive Summary with gtsummary (N, %, Median IQR)1:06
You will practice making short descriptive summaries using gtsummary, including sample size (N), percentages for categorical variables, and (or median/IQR) for continuous variables. The goal is to quickly describe your dataset for thesis, manuscript, or reports using a standard table format accepted in public health research.
Set Reference Category for Categorical Variables (Factor Baseline) in RStudio5:01
In this lecture, you will learn why a reference category (baseline) is important for categorical variables in regression. We will discuss how the reference group affects interpretation of Odds Ratio (OR) and Risk Ratio (RR), how to choose a meaningful baseline (e.g., “No exposure” or “Normal”), and how to avoid common reporting mistakes.
Variable Labels in RStudio (Var Lab) for Clean, Readable Output2:44
In this lecture, you will learn why variable labels are important for research analysis and publication. We will discuss how labels make tables and plots easier to understand, how to label variables consistently (exposure, outcome, covariates), and how good labeling reduces mistakes when preparing thesis, manuscript, and supplementary materials.
Saving Data in .RData Format | The cleaned dataset was saved in .RData format1:23
Learn how and why to save your cleaned dataset in .RData format to preserve variables, labels, and analysis-ready structure.
Opening and Loading .RData Files in RStudio1:16

Univariate Analysis in RStudio using gtsummary2:40
Learn how to perform univariate analysis in RStudio using gtsummary to summarize numeric and categorical variables clearly.
Bivariate Analysis in RStudio using gtsummary1:24
Learn how to perform bivariate analysis in RStudio using gtsummary to compare variables and calculate p-values.
Reporting n (%) and Mean ± SD in RStudio using gtsummary3:12
Learn how to summarize categorical data as n (%) and continuous data as mean ± SD in RStudio using gtsummary.
Setting Decimal Points for n (%) and Mean ± SD1:58
Learn how to report n (%), mean ± SD, and p-values with appropriate decimal places for epidemiology and biostatistics tables. Data are presented as n (%) for categorical variables and mean ± SD for continuous variables.
P-value Reporting: Chi-square & t-test in RStudio5:05
Learn how to report and interpret p-values from Chi-square and t-tests for epidemiology and biostatistics research tables. Group differences were assessed using the Chi-square test for categorical variables and the t-test for continuous variables.
Exporting Tables to MS Word for Manuscripts3:51
Learn how to export descriptive and bivariate analysis tables into MS Word format for thesis and journal submission.

Bar Diagram of a Binary Variable by Categorical Groups5:07
Learn how to visualize a binary outcome across categories using grouped bar diagrams in RStudio with ggplot2.
Modifying Bar Diagrams: Binary Variable by Categorical Groups5:27
Learn how to modify and customize grouped bar diagrams for binary variables to create clear, publication-ready figures.
Bar Diagram of a Binary Variable by Two Categorical Groups10:00
Bar diagrams were used to compare a binary outcome across two categorical grouping variables.

To assess association between food insecurity and anemia3:40
In this lecture, you will learn how to clearly define a research objective and apply it in a real-world health study. We focus on assessing the association between food insecurity and anemia using survey data. You will understand how research questions guide variable selection, statistical analysis, and model choice. This lecture is especially useful for students working with DHS, public health datasets, or epidemiological

Simple Logistic Regression in RStudio (gtsummary)2:57
Learn how to perform simple logistic regression and present crude odds ratios (OR) using gtsummary
Interpreting Odds Ratios in Logistic Regression2:44
In this lecture, we will dive into the interpretation of odds ratios in logistic regression models, particularly focusing on how log transformation impacts the interpretation. Learn to understand the scale and transformations in simple and multiple logistic regression using real-world examples in RStudio
Multiple Logistic Regression in RStudio (gtsummary)2:30
Learn how to perform multiple logistic regression to estimate adjusted odds ratios (AOR) using gtsummary.
Interpret Adjusted Odds Ratio from Multiple Logistic Regression in R2:52
In this lecture, you will learn how to interpret adjusted odds ratios from multiple logistic regression analysis in RStudio. We will explain how to report odds ratio, 95% confidence interval, and p-value in a clear and practical way. You will also learn how to write interpretation sentences for research papers, thesis results, and public health data analysis.
After completing this lecture, you will be able to explain adjusted odds ratios correctly and present logistic regression findings in a professional format.
Publication-Ready Table: Unadjusted Odds Ratios (Simple Logistic Regression)9:21
Learn how to create a publication-ready table of unadjusted odds ratios using simple logistic regression for all independent variables.
Publication-Ready Table: Adjusted Odds Ratios (Multiple Logistic Regression)4:29
Students will be able to prepare clean, publication-ready tables presenting adjusted odds ratios for epidemiology and public health research.
Final Publication-Ready Table: Unadjusted & Adjusted Odds Ratios4:33
Learn how to present unadjusted and adjusted odds ratios together in a single, publication-ready logistic regression table.

Simple Log-Binomial Regression to Estimate Risk Ratio in R4:09
In this lecture, you will learn how to estimate risk ratio using simple log-binomial regression in RStudio. We will discuss when risk ratio is useful, how it differs from odds ratio, and how to interpret the regression output clearly.
After completing this lecture, you will be able to run a simple log-binomial regression model in R, estimate risk ratio, and write a clear interpretation for research reports, thesis results, and public health data analysis.
Multiple Log-Binomial / Modified Poisson Regression to Estimate Adjusted RR4:06
In this lecture, you will learn how to estimate adjusted risk ratio using multiple log-binomial regression and modified Poisson regression in RStudio. We will explain why adjusted risk ratio is useful for binary outcomes and how it can be easier to interpret than odds ratio in many public health and epidemiological studies.
After completing this lecture, you will be able to run adjusted risk ratio models in R, interpret the results, and report risk ratio, 95% confidence interval, and p-value clearly in research papers, thesis work, and data analysis reports.

Requirements

A laptop/PC with internet access
Install R and RStudio (free) — I’ll guide you step-by-step

Description

Practical R Programming for Biostatistical Data Analysis is designed for MSc/MPH/PhD students, public health researchers, and anyone who wants to analyze health data efficiently using R and RStudio. If you are planning to publish a paper, write a thesis, or prepare a research report, this course will guide you step by step with practical, real-world examples.

This is a hands-on course focused on applied data analysis rather than theory. You will learn how to work with real datasets and build a complete analysis workflow in R.

You will start from the basics: setting up R and RStudio, understanding the interface, and working with projects, scripts, and packages. Then, you will move into real research workflows—importing data (Excel/CSV), cleaning and recoding variables, handling missing values, and preparing analysis-ready datasets using tidyverse tools.

Next, you will learn how to create clear and publication-ready tables and visualizations. The course focuses on applied epidemiological analysis, including regression models and interpretation. You will learn how to estimate and interpret Odds Ratios (OR) using logistic regression and Risk Ratios (RR) using log-binomial or Poisson models, along with presenting results using 95% confidence intervals and p-values.

By the end of this course, you will be able to conduct a complete data analysis workflow in RStudio and produce publication-ready outputs with confidence.

Who this course is for

MSc, MPH, and PhD students
Public health and epidemiology researchers
Professionals working with health or survey data
Anyone with basic R knowledge who wants practical data analysis skills

Who this course is for:

Ideal for MSc/PhD students and researchers who want to learn RStudio for epidemiology/biostatistics data analysis and publication

Crash Course: R Programming for Biostatistical Data Analysis

What you'll learn

Explore related topics

Course content

Introduction1 lecture • 2min

Getting Started with Data Analysis in R (Essential Basics)7 lectures • 21min

Data Preparation and Management in RStudio8 lectures • 19min

Descriptive Analysis & Statistical Tests6 lectures • 18min

Data Visualization in RStudio | Bar Diagram for Binary Variables using ggplot23 lectures • 21min

Boxplots for Data Visualization | RStudio with ggplot2 lectures • 8min

Research Objective in our study1 lecture • 4min

Logistic Regression Analysis in R | Odds Ratio Estimation7 lectures • 29min

Log-Binomial and Modified Poisson Regression to Estimate Risk Ratio in R2 lectures • 8min

Requirements

Description

Who this course is for: