
Welcome to the RStudio Crash Course for Biostatistics and Epidemiology. Learn what you’ll study and how this course will support your research and publications.
In this lecture, you will learn how to set up RStudio and install the required packages for this course. By the end of this lecture, you will be able to open RStudio, install essential packages, and load them for use in your analysis.
In this lecture, you will learn how to import Excel data into R using the readxl package. By the end of this lecture, you will be able to read datasets and prepare them for data analysis in R.
In this lecture, you will learn how to create a box plot using ggplot2 in R for data visualization. By the end of this lecture, you will be able to visualize data distribution and compare groups.
In this lecture, you will learn how to generate summary tables in R using the gtsummary package. By the end of this lecture, you will be able to create descriptive statistics tables for research and reporting.
In this lecture, you will learn how to select and remove variables using dplyr in R for data manipulation. By the end of this lecture, you will be able to clean and manage your dataset efficiently.
In this lecture, you will learn how to create new variables using the mutate() function in dplyr. By the end of this lecture, you will be able to transform and recode variables for analysis.
In this lecture, you will learn how to use the pipe operator (%>%) in R to combine multiple data manipulation steps. By the end of this lecture, you will be able to write cleaner and more efficient tidyverse workflows.
Learn how to import Excel (.xlsx) files into RStudio using readxl, select sheets, and create an analysis-ready dataset for public health, epidemiology, and biostatistics.
This lecture shows how to prepare factor variables for biostatistics and epidemiology analysis in RStudio.
In this lecture, you will learn how to prepare numeric variables in RStudio for real research analysis. We will convert variables to numeric correctly, handle common problems like characters, factors, commas, and special symbols, and create clean values with missing (NA) where needed. You will also check ranges, outliers, and summary statistics so your dataset is ready for regression and publication tables.
You will practice making short descriptive summaries using gtsummary, including sample size (N), percentages for categorical variables, and (or median/IQR) for continuous variables. The goal is to quickly describe your dataset for thesis, manuscript, or reports using a standard table format accepted in public health research.
In this lecture, you will learn why a reference category (baseline) is important for categorical variables in regression. We will discuss how the reference group affects interpretation of Odds Ratio (OR) and Risk Ratio (RR), how to choose a meaningful baseline (e.g., “No exposure” or “Normal”), and how to avoid common reporting mistakes.
In this lecture, you will learn why variable labels are important for research analysis and publication. We will discuss how labels make tables and plots easier to understand, how to label variables consistently (exposure, outcome, covariates), and how good labeling reduces mistakes when preparing thesis, manuscript, and supplementary materials.
Learn how and why to save your cleaned dataset in .RData format to preserve variables, labels, and analysis-ready structure.
Learn how to perform univariate analysis in RStudio using gtsummary to summarize numeric and categorical variables clearly.
Learn how to perform bivariate analysis in RStudio using gtsummary to compare variables and calculate p-values.
Learn how to summarize categorical data as n (%) and continuous data as mean ± SD in RStudio using gtsummary.
Learn how to report n (%), mean ± SD, and p-values with appropriate decimal places for epidemiology and biostatistics tables. Data are presented as n (%) for categorical variables and mean ± SD for continuous variables.
Learn how to report and interpret p-values from Chi-square and t-tests for epidemiology and biostatistics research tables. Group differences were assessed using the Chi-square test for categorical variables and the t-test for continuous variables.
Learn how to export descriptive and bivariate analysis tables into MS Word format for thesis and journal submission.
Learn how to visualize a binary outcome across categories using grouped bar diagrams in RStudio with ggplot2.
Learn how to modify and customize grouped bar diagrams for binary variables to create clear, publication-ready figures.
Bar diagrams were used to compare a binary outcome across two categorical grouping variables.
Visualizing the distribution of a single continuous variable.
Visualizing group differences across two categorical factors.
In this lecture, you will learn how to clearly define a research objective and apply it in a real-world health study. We focus on assessing the association between food insecurity and anemia using survey data. You will understand how research questions guide variable selection, statistical analysis, and model choice. This lecture is especially useful for students working with DHS, public health datasets, or epidemiological
Learn how to perform simple logistic regression and present crude odds ratios (OR) using gtsummary
In this lecture, we will dive into the interpretation of odds ratios in logistic regression models, particularly focusing on how log transformation impacts the interpretation. Learn to understand the scale and transformations in simple and multiple logistic regression using real-world examples in RStudio
Learn how to perform multiple logistic regression to estimate adjusted odds ratios (AOR) using gtsummary.
In this lecture, you will learn how to interpret adjusted odds ratios from multiple logistic regression analysis in RStudio. We will explain how to report odds ratio, 95% confidence interval, and p-value in a clear and practical way. You will also learn how to write interpretation sentences for research papers, thesis results, and public health data analysis.
After completing this lecture, you will be able to explain adjusted odds ratios correctly and present logistic regression findings in a professional format.
Learn how to create a publication-ready table of unadjusted odds ratios using simple logistic regression for all independent variables.
Students will be able to prepare clean, publication-ready tables presenting adjusted odds ratios for epidemiology and public health research.
Learn how to present unadjusted and adjusted odds ratios together in a single, publication-ready logistic regression table.
In this lecture, you will learn how to estimate risk ratio using simple log-binomial regression in RStudio. We will discuss when risk ratio is useful, how it differs from odds ratio, and how to interpret the regression output clearly.
After completing this lecture, you will be able to run a simple log-binomial regression model in R, estimate risk ratio, and write a clear interpretation for research reports, thesis results, and public health data analysis.
In this lecture, you will learn how to estimate adjusted risk ratio using multiple log-binomial regression and modified Poisson regression in RStudio. We will explain why adjusted risk ratio is useful for binary outcomes and how it can be easier to interpret than odds ratio in many public health and epidemiological studies.
After completing this lecture, you will be able to run adjusted risk ratio models in R, interpret the results, and report risk ratio, 95% confidence interval, and p-value clearly in research papers, thesis work, and data analysis reports.
Practical R Programming for Biostatistical Data Analysis is designed for MSc/MPH/PhD students, public health researchers, and anyone who wants to analyze health data efficiently using R and RStudio. If you are planning to publish a paper, write a thesis, or prepare a research report, this course will guide you step by step with practical, real-world examples.
This is a hands-on course focused on applied data analysis rather than theory. You will learn how to work with real datasets and build a complete analysis workflow in R.
You will start from the basics: setting up R and RStudio, understanding the interface, and working with projects, scripts, and packages. Then, you will move into real research workflows—importing data (Excel/CSV), cleaning and recoding variables, handling missing values, and preparing analysis-ready datasets using tidyverse tools.
Next, you will learn how to create clear and publication-ready tables and visualizations. The course focuses on applied epidemiological analysis, including regression models and interpretation. You will learn how to estimate and interpret Odds Ratios (OR) using logistic regression and Risk Ratios (RR) using log-binomial or Poisson models, along with presenting results using 95% confidence intervals and p-values.
By the end of this course, you will be able to conduct a complete data analysis workflow in RStudio and produce publication-ready outputs with confidence.
Who this course is for
MSc, MPH, and PhD students
Public health and epidemiology researchers
Professionals working with health or survey data
Anyone with basic R knowledge who wants practical data analysis skills