Teach on Udemy

Turn what you know into an opportunity and reach millions around the world.

Learn More

Your cart is empty.

Keep shopping

Demographic and Health Survey Data Analysis in R and RStudio

Name: Demographic and Health Survey Data Analysis in R and RStudio
Rating: 4.8 (14 reviews)

Practical Survey Data Analysis Using RStudio & R with DHS Data for Public Health Research

Created byMd Ahshanul Haque

Last updated 1/2026

English

English [Auto],

What you'll learn

Import, understand, and manage Demographic and Health Survey (DHS) data in R and RStudio
Perform descriptive and exploratory analysis of DHS survey data using R
Apply survey design concepts and conduct survey-weighted analyses in R
Analyze key public health topics (childhood malnutrition, maternal health, child feeding, women’s empowerment) using DHS data
Produce clean, reproducible, and publication-ready tables and results from DHS data

Course content

14 sections • 85 lectures • 6h 20m total length

Welcome to RStudio for DHS Data Analysis2:33
Welcome to the course! In this lecture, I will briefly introduce myself, explain the purpose of the course, and guide you on how to get the most benefit from the learning materials.
Course Overview: RStudio, Statistics & Publication Workflow6:23
This lecture provides a clear overview of the course structure, topics covered, and learning outcomes. You will understand how the course progresses from data preparation to statistical analysis, visualization, and publication-ready reporting using RStudio.
Important Note About R Version Updates0:23
R software is updated frequently, and because of these updates, you may sometimes face errors while running the code shown in this course. Don’t worry—this is normal. I regularly update the code to keep it compatible with the latest R versions. If you face any issues or errors, feel free to message me anytime. I’m always here to help you continue your learning smoothly.

Importing Tanzania DHS Data from Stata (.dta) Files2:28
Learn how to import Tanzania Demographic and Health Survey (DHS) data from Stata (.dta) files into R and RStudio, and check the dataset structure for analysis.
Importing Tanzania DHS Data from SPSS (.sav) Files1:39
Learn how to import SPSS (.sav) files into RStudio while keeping variable labels, value labels, and missing values correctly.
Reading DHS Data from Other Countries (Peru, Ethiopia)3:01
This lecture demonstrates how to import DHS datasets from other countries, such as Peru and Ethiopia, and highlights key similarities and differences across country datasets.
Understanding DHS Variables Using Codebooks6:11
This lecture explains how to use DHS codebooks and recode manuals to correctly interpret variables, categories, and survey indicators for analysis.
Selecting Variables Using the select() Function (Examples)6:08
This lecture demonstrates how to select relevant variables from DHS datasets using the select() function in R, with practical examples for analysis.
Creating a Short Summary Table Using the gtsummary Package1:23
This lecture shows how to create a short descriptive summary table using the gtsummary package in R for DHS data analysis.

Key DHS Variables for Stunting, Wasting, and Underweight2:45
This lecture explains the definitions of stunting, wasting, and underweight, and identifies the key variables required to construct these nutritional indicators using Demographic and Health Survey (DHS) data. Although examples are shown using one country dataset, the same R code and workflow are fully applicable to DHS datasets from other countries, such as Bangladesh DHS (BDHS), Nepal DHS (NDHS), Ethiopia DHS (EDHS), and many more.
Choosing DHS Variables for Nutritional Indicators and Covariates | CodeBook6:14
Learn how to select outcome variables and covariates needed to create nutritional indicators from DHS data.
Preparing Factor (Categorical) Variables and Value Labels4:34
This lecture demonstrates how to convert DHS variables into factor (categorical) variables and apply meaningful value labels for clear and accurate analysis.
Preparing Numerical (Continuous) Variables from the DHS Dataset4:00
This lecture demonstrates how to prepare numerical (continuous) variables from DHS data, including checking ranges, handling missing values, and preparing variables for analysis.
Identifying Unusual / extreme Values and Correcting Decimal Points4:14
This lecture demonstrates how to identify unusual or out-of-range values and correct decimal point issues commonly found in DHS datasets before analysis.
Replacing Extreme and Unusual Values with Missing and Generating New Variables7:36
This lecture demonstrates how to identify extreme or unrealistic values in DHS data, replace them with missing values, and generate new variables for analysis, using examples such as HAZ, WAZ, WHZ, BMI, and ANC indicators.
Generating Nutritional Indicators: Stunting, Wasting, and Underweight6:12
This lecture demonstrates how to generate standard child nutritional indicators—stunting, wasting, and underweight—using DHS data and WHO cut-off points in R.
Finalizing the Variable List for Analysis | Nutritional and covariates2:57
This lecture explains how to finalize the list of outcome variables, covariates, and nutritional indicators to create an analysis-ready DHS dataset.
Short Summary of Updated Variables | Review of Finalized and Updated Variables2:51
This lecture provides a brief summary of all updated outcome variables, covariates, and nutritional indicators prepared for analysis.
Modifying Value Labels for Factor Variables7:58
This lecture demonstrates how to modify and update value labels for factor (categorical) variables to improve clarity, interpretation, and presentation of results.
Renaming Variables Based on Variable Names and Meaning6:04
This lecture demonstrates how to rename variables based on their names, meaning, and identity to improve readability, interpretation, and analysis of DHS datasets.
Adding Variable Labels to the Current Dataset10:06
This lecture demonstrates how to add meaningful variable labels to the current DHS dataset to improve clarity, interpretation, and reporting of analysis results.
Saving the Dataset in RData Format3:16
This lecture demonstrates how to save the finalized and analysis-ready DHS dataset in RData format for efficient reuse, reproducibility, and future analysis.

Univariate Descriptive Statistics (Unweighted Analysis) gtsummary5:44
This lecture introduces univariate descriptive statistics to summarize individual variables without applying sampling weights. You will learn how to calculate frequencies, percentages, means, and standard deviations for both categorical and continuous variables. This lecture builds the foundation for understanding data distributions before moving to bivariate analysis and statistical testing.
Formatting Categorical Variables as n (%)2:26
In this lecture, you will learn how to summarize and present categorical variables using counts (n) and percentages (%) without applying sampling weights. You will practice formatting variables such as sex, education, residence, wealth index, and nutritional status into clear and interpretable tables, which are commonly used in public health research, theses, and journal articles.
Formatting Continuous Variables as Mean ± Standard Deviation1:30
This lecture explains how to summarize continuous variables using mean ± standard deviation (SD) without applying sampling weights. You will learn how to format variables such as age, BMI, and z-scores (HAZ, WAZ, WHZ) into clear, publication-ready tables commonly used in epidemiology and biostatistics research.
Setting Decimal Places for Statistical Results2:28
This lecture explains how to set and control decimal places for percentages, means, and standard deviations in unweighted analysis. You will learn how to present results consistently (e.g., 1 or 2 decimal places) to improve clarity, readability, and compliance with academic and journal reporting standards.
Bivariate Analysis Using Column Percentages (Unweighted)1:28
This lecture explains how to perform bivariate analysis using column percentages without applying sampling weights. You will learn how to create cross-tabulation tables and interpret column-wise percentages to compare outcomes across exposure groups, a common approach in public health and epidemiology research.
Adding Overall Findings to Bivariate Tables5:02
This lecture explains how to add an overall (total) column or row to bivariate analysis tables. You will learn how to present overall frequencies and percentages alongside group-specific column percentages, improving interpretation and making tables more informative for reports, theses, and journal articles.
Adding p-Values Using t-Test and Chi-Square Test (Unweighted)3:28
This lecture explains how to add p-values to bivariate analysis tables using appropriate statistical tests. You will learn when to apply the t-test for continuous variables and the Chi-square test for categorical variables, all without adjusting for sampling weights. The lecture also covers how to interpret p-values and present them correctly in academic tables.
Exporting Publication-Ready Tables to MS Word3:44
This lecture demonstrates how to export descriptive and bivariate analysis tables as publication-ready Microsoft Word (DOCX) files. You will learn how to format tables with proper alignment, decimal places, overall columns, and p-values so they meet journal, thesis, and report submission standards, all based on unweighted analysis.

Setting Up Survey Design (svyset / svydesign)3:42
Learn how to define survey weights, primary sampling units (PSU), and strata to create a valid survey design object for analysis.
Univariate Descriptive Statistics (Survey-Weighted)2:23
Compute weighted frequencies, percentages, means, and standard errors to summarize individual variables using survey weights.
Checking Findings Against Published Reports1:04
This lecture explains how to validate survey-weighted results by comparing them with published reports such as DHS, NFHS, and BDHS final reports. You will learn how to cross-check key indicators, percentages, and trends, identify acceptable differences, and interpret discrepancies caused by weighting, sub-population selection, or variable definitions.
Table Formatting – n (%) and Mean ± SD (Survey-Weighted)4:15
This lecture demonstrates how to format survey-weighted tables using n (%) for categorical variables and mean ± standard deviation (SD) for continuous variables. You will learn how to present results clearly and consistently, following common reporting standards used in DHS/NFHS reports, theses, and peer-reviewed journals.
Survey-Weighted Bivariate Analysis (svy)2:49
This lecture introduces bivariate analysis using survey-weighted methods (svy). You will learn how to examine associations between outcomes and explanatory variables while accounting for sampling weights, clusters (PSU), and strata. The lecture covers weighted cross-tabulations, column percentages, and correct interpretation of results for complex survey data such as DHS, NFHS, and BDHS.
Survey-Weighted Chi-Square Test and p-Value (svy)5:15
This lecture explains how to perform the survey-weighted Chi-square test (Rao–Scott adjusted Chi-square) to assess associations between categorical variables. You will learn how to obtain design-adjusted p-values, interpret results correctly, and report them in tables when working with complex survey data such as DHS, NFHS, and BDHS.
Exporting Survey-Weighted Tables to MS Word (Publication-Ready)3:33
This lecture demonstrates how to export survey-weighted descriptive and bivariate analysis tables to Microsoft Word (DOCX) format. You will learn how to produce publication-ready tables with n (%), mean ± SD, overall columns, and survey-adjusted p-values, formatted correctly for theses, reports, and peer-reviewed journals using complex survey data.

Bar Diagram for a Single Categorical Variable6:38
Create simple bar charts to display the distribution of a categorical variable such as sex, residence, education, or wealth index.
Bar Diagram for a Single Categorical Variable – Customization & Labeling5:58
This lecture demonstrates how to create a bar diagram for a single categorical variable using ggplot2 and then modify and label the chart for clarity and presentation. You will learn how to adjust bar width, colors, axis labels, titles, and add value labels (counts or percentages) to produce clean, publication-ready visualizations.
Bar Diagram for Two Categorical Variables (Grouped Bar Chart)9:06
This lecture explains how to create bar diagrams for two categorical variables using grouped (side-by-side) bar charts in ggplot2. You will learn how to compare distributions across groups (e.g., outcome by residence or education), apply clear labeling, legends, and basic customization to produce interpretable and presentation-ready visualizations.

Bar Diagram for a Binary Variable by One Categorical Variable8:16
This lecture demonstrates how to create a bar diagram for a binary variable (e.g., Yes/No, 0/1) grouped by a single categorical variable using ggplot2 in RStudio. You will learn how to display group-wise percentages, add clear labels and legends, and customize the chart to clearly compare outcomes across categories such as residence, education, or wealth index.
Bar Diagram for a Binary Variable by Two Categorical Variables11:54
This lecture explains how to create bar diagrams for a binary outcome variable (e.g., Yes/No, 0/1) stratified by two categorical variables using ggplot2 in RStudio. You will learn how to visualize group-wise percentages, use grouped or faceted bar charts, apply clear labels and legends, and customize the plot to compare outcomes across multiple categories such as residence and wealth index or education and region.
Removing Background and Using Clean Themes in ggplot22:19
This lecture shows how to remove the background from bar diagrams in ggplot2 to create clean, professional-looking plots. You will learn how to use minimalist themes, remove grid lines and panel backgrounds, and adjust text and axis elements so figures are suitable for MS Word, PowerPoint, and journal publication.
Bar Diagram for Multiple Binary Variables13:22
This lecture demonstrates how to create bar diagrams for multiple binary variables (e.g., stunting, wasting, underweight, ANC ≥4) using ggplot2 in RStudio. You will learn how to reshape data, calculate percentages, and display multiple binary outcomes in a single or faceted bar chart with clear labels and a clean layout suitable for reports, theses, and publications.
Exporting Figures to a Folder in RStudio (Publication-Ready)5:07
This lecture demonstrates how to export figures (bar charts, box plots, regression plots) from RStudio into a designated folder on your computer. You will learn how to set the working directory, choose appropriate file formats (PNG, JPEG, PDF), control resolution (DPI), and ensure figures are publication-ready for use in Microsoft Word, PowerPoint, and academic journals.

Box Plot for a Single Continuous Variable2:43
Create a basic box plot for a single continuous variable such as age, BMI, or z-scores (HAZ, WAZ, WHZ) using ggplot2.
Box Plot for a Single Continuous Variable – Labeling & Customization5:38
This lecture demonstrates how to create a box plot for a single continuous variable (such as age, BMI, or z-scores) using ggplot2 in RStudio, with a strong focus on proper labeling and customization. You will learn how to add clear axis labels, titles, units, and optional annotations to improve interpretability and make figures suitable for reports, theses, and journal publications.
Box Plot for a Single Continuous Variable by One Categorical Variable3:02
This lecture explains how to create a box plot for a continuous variable (e.g., age, BMI, HAZ) grouped by one categorical variable (such as sex, residence, education, or wealth index) using ggplot2 in RStudio. You will learn how to compare medians, variability, and outliers across groups, apply clear labels and themes, and produce publication-ready visualizations.
Box Plot for a Single Continuous Variable by Two Categorical Variables2:27
This lecture demonstrates how to create a box plot for a single continuous variable (e.g., age, BMI, HAZ, WAZ) stratified by two categorical variables using ggplot2 in RStudio. You will learn how to use grouped and faceted box plots to compare distributions across multiple groups, interpret differences in medians and variability, and apply clear labeling and themes to produce publication-ready visualizations.
Modifying and Labeling Box Plots for Publication-Ready Graphics8:04
This lecture demonstrates how to modify and label box plots to create publication-ready figures using ggplot2 in RStudio. You will learn how to adjust axis titles, units, font sizes, colors, themes, and remove unnecessary elements to meet journal, thesis, and report formatting standards. The lecture also covers best practices for clarity, consistency, and professional presentation.

Creating Density Plots to Explore Data Distribution3:31
This lecture demonstrates how to create and interpret density plots in R to visualize the distribution of continuous variables and compare patterns across groups.
Creating Histograms to Explore Data Distribution in R-Studio1:53
Create histograms in R-Studio to explore BMI distribution by dropping NA, mapping BMI with aes(x = BMI), and setting fill to sky blue and color to black.
Creating Histograms with after_stat() in ggplot20:56
This lecture demonstrates how to create histograms using after_stat() in ggplot2 to correctly scale counts, density, and proportions for data distribution analysis.
Histogram with Density Plot Overlay in R-Studio1:48
This lecture demonstrates how to overlay a density plot on a histogram to visualize data distribution more clearly and compare observed patterns with a smooth density curve.
Histogram with Normal Curve Overlay in R-studio4:45
This lecture demonstrates how to overlay a normal distribution curve on a histogram to assess data normality and understand the distribution of continuous variables in R.

Defining the Outcome Variable for Logistic Regression1:54
This lecture explains how to identify an appropriate binary outcome variable (e.g., stunting yes/no, disease status yes/no), define it clearly, and code it correctly (0/1) for logistic regression in RStudio. You will also learn how a well-defined outcome aligns with your research objective and ensures valid interpretation of odds ratios.
Simple Logistic Regression (Unadjusted Odds Ratio)5:51
This lecture introduces simple (univariate) logistic regression, where one explanatory variable is analyzed at a time against a binary outcome. You will learn how to fit the model in RStudio, estimate unadjusted odds ratios (ORs) with 95% confidence intervals, and present results using the gtsummary package. This approach is commonly used for screening variables before multiple logistic regression.
Multiple Logistic Regression (Adjusted Odds Ratios)2:38
This lecture introduces multiple (multivariable) logistic regression, where several exposure and confounding variables are included in a single model. You will learn how to fit the model in RStudio, interpret adjusted odds ratios (AORs) with 95% confidence intervals, and present results using the gtsummary package. This approach is essential for controlling confounding in epidemiology and public health research.
Changing Reference Categories for Categorical Independent Variables5:13
This lecture explains how to choose and change the reference category for categorical independent variables (e.g., education, wealth index, residence) in logistic regression models. You will learn how reference categories affect odds ratio interpretation, and how to update them in RStudio so results are clearly presented in gtsummary tables for theses and journal articles.
Recap – Variable Modification for Logistic Regression8:57
This recap lecture reviews key steps in modifying outcome and independent variables for logistic regression, including recoding categories, setting reference groups, labeling variables, and handling missing values. The session reinforces best practices to ensure correct model estimation and clear interpretation of results in RStudio using the gtsummary package.
Publication-Ready Table for Unadjusted Odds Ratios11:52
This lecture demonstrates how to format unadjusted logistic regression results into a publication-ready table using the gtsummary package in RStudio. You will learn how to display ORs, 95% confidence intervals, and p-values, apply proper labels, and present results in a format commonly required for journal articles, theses, and dissertations.
Publication-Ready Table for Adjusted Odds Ratios (AOR)4:22
This lecture demonstrates how to format results from multiple logistic regression into a publication-ready table using gtsummary in RStudio. You will learn how to display AORs with 95% confidence intervals and p-values, apply clear variable labels and reference categories, and prepare tables suitable for journal articles, theses, and dissertations.
Final – Publication-Ready Table (Unadjusted & Adjusted Odds Ratios)2:29
This lecture demonstrates how to combine unadjusted and adjusted logistic regression results into one clean, journal-ready table using gtsummary in RStudio. You will learn how to align ORs and AORs side-by-side, include 95% confidence intervals and p-values, add appropriate footnotes for covariate adjustment, and prepare the final table output suitable for manuscripts, theses, and dissertations.
Saving Publication-Ready Tables in MS Word2:55
This lecture demonstrates how to export publication-ready tables (including unadjusted and adjusted odds ratios) from RStudio using the gtsummary package into MS Word format. You will learn how to preserve table formatting, alignment, decimal places, confidence intervals, and footnotes so tables are ready for journal submission, theses, and dissertations.
Factors Associated with Stunting, Underweight, and Wasting9:12
This final lecture applies logistic regression methods to examine determinants of child undernutrition—stunting, underweight, and wasting. You will learn how to run separate models for each outcome, interpret unadjusted and adjusted odds ratios, and present results in clear, publication-ready tables using gtsummary in RStudio. The lecture emphasizes correct interpretation, comparison across outcomes, and table formats commonly used in peer-reviewed journals, theses, and policy reports.

Requirements

Basic familiarity with R and RStudio is helpful but not mandatory
A computer with R and RStudio installed
Interest in data analysis, public health, or survey data

Description

This course provides a practical, step-by-step guide to Demographic and Health Survey (DHS) data analysis using R and RStudio for public health, epidemiology, and health research. It is designed for MSc, MPH, PhD students, researchers, and data analysts who want to analyze survey data and produce publication-ready tables and figures.

You will learn how to work with DHS/NFHS-type survey datasets using RStudio, starting from data preparation and variable modification to descriptive, univariate, and bivariate analysis. The course covers both unweighted and survey-weighted (svy) analysis, including correct handling of sampling weights, clusters, and strata.

Key topics include descriptive statistics, chi-square tests, t-tests, bar diagrams, box plots, and logistic regression for binary outcomes such as stunting, underweight, and wasting. You will learn how to estimate unadjusted and adjusted odds ratios, change reference categories, interpret results correctly, and generate publication-ready tables using the gtsummary package.

The course also emphasizes data visualization with ggplot2, showing how to create clean, professional graphs suitable for theses, reports, and journal articles. You will learn how to export tables and figures to Microsoft Word while preserving formatting.

This is a hands-on, applied course, using real DHS-style data examples from countries such as Bangladesh, India, Nepal, Ethiopia, Nigeria, Kenya, and Tanzania. Advanced topics such as GEE, multilevel models, and longitudinal analysis will be added progressively.

By the end of this course, you will be confident in analyzing survey data in RStudio and producing results ready for academic publication and policy research.

Who this course is for:

Graduate students (MPH, MSc, PhD) working with Demographic and Health Survey (DHS) data
Professionals and researchers who want to produce publication-ready analyses from DHS data
Data analysts and statisticians interested in analyzing survey data using R

Demographic and Health Survey Data Analysis in R and RStudio

What you'll learn

Explore related topics

Course content

Introduction to RStudio for DHS Data Analysis3 lectures • 9min

Reading and Understanding DHS Data and Codebooks (Multi-Country)6 lectures • 21min

Preparing Nutritional Indicators from DHS Data: Stunting, Wasting, &Underweigt13 lectures • 1hr 9min

Descriptive, Univariate & Bivariate Analysis with Statistical Tests (Unweighted)8 lectures • 26min

Descriptive, Univariate & Bivariate Analysis | Survey-Weigthed | p-value7 lectures • 23min

Bar Diagrams for Categorical Variables in RStudio (ggplot2)3 lectures • 22min

Bar Diagrams for Binary Variables in RStudio (ggplot2)5 lectures • 41min

Box Plots for Continuous Variables in RStudio (ggplot2)5 lectures • 22min

Histogram, Density Plot, and Normal Curve in R-studio | R programming5 lectures • 13min

Logistic Regression in RStudio Using the gtsummary Package10 lectures • 55min

Requirements

Description

Who this course is for: