
Welcome to the course! In this lecture, I will briefly introduce myself, explain the purpose of the course, and guide you on how to get the most benefit from the learning materials.
This lecture provides a clear overview of the course structure, topics covered, and learning outcomes. You will understand how the course progresses from data preparation to statistical analysis, visualization, and publication-ready reporting using RStudio.
R software is updated frequently, and because of these updates, you may sometimes face errors while running the code shown in this course. Don’t worry—this is normal. I regularly update the code to keep it compatible with the latest R versions. If you face any issues or errors, feel free to message me anytime. I’m always here to help you continue your learning smoothly.
Learn how to import Tanzania Demographic and Health Survey (DHS) data from Stata (.dta) files into R and RStudio, and check the dataset structure for analysis.
Learn how to import SPSS (.sav) files into RStudio while keeping variable labels, value labels, and missing values correctly.
This lecture demonstrates how to import DHS datasets from other countries, such as Peru and Ethiopia, and highlights key similarities and differences across country datasets.
This lecture explains how to use DHS codebooks and recode manuals to correctly interpret variables, categories, and survey indicators for analysis.
This lecture demonstrates how to select relevant variables from DHS datasets using the select() function in R, with practical examples for analysis.
This lecture shows how to create a short descriptive summary table using the gtsummary package in R for DHS data analysis.
This lecture explains the definitions of stunting, wasting, and underweight, and identifies the key variables required to construct these nutritional indicators using Demographic and Health Survey (DHS) data. Although examples are shown using one country dataset, the same R code and workflow are fully applicable to DHS datasets from other countries, such as Bangladesh DHS (BDHS), Nepal DHS (NDHS), Ethiopia DHS (EDHS), and many more.
Learn how to select outcome variables and covariates needed to create nutritional indicators from DHS data.
This lecture demonstrates how to convert DHS variables into factor (categorical) variables and apply meaningful value labels for clear and accurate analysis.
This lecture demonstrates how to prepare numerical (continuous) variables from DHS data, including checking ranges, handling missing values, and preparing variables for analysis.
This lecture demonstrates how to identify unusual or out-of-range values and correct decimal point issues commonly found in DHS datasets before analysis.
This lecture demonstrates how to identify extreme or unrealistic values in DHS data, replace them with missing values, and generate new variables for analysis, using examples such as HAZ, WAZ, WHZ, BMI, and ANC indicators.
This lecture demonstrates how to generate standard child nutritional indicators—stunting, wasting, and underweight—using DHS data and WHO cut-off points in R.
This lecture explains how to finalize the list of outcome variables, covariates, and nutritional indicators to create an analysis-ready DHS dataset.
This lecture provides a brief summary of all updated outcome variables, covariates, and nutritional indicators prepared for analysis.
This lecture demonstrates how to modify and update value labels for factor (categorical) variables to improve clarity, interpretation, and presentation of results.
This lecture demonstrates how to rename variables based on their names, meaning, and identity to improve readability, interpretation, and analysis of DHS datasets.
This lecture demonstrates how to add meaningful variable labels to the current DHS dataset to improve clarity, interpretation, and reporting of analysis results.
This lecture demonstrates how to save the finalized and analysis-ready DHS dataset in RData format for efficient reuse, reproducibility, and future analysis.
This lecture introduces univariate descriptive statistics to summarize individual variables without applying sampling weights. You will learn how to calculate frequencies, percentages, means, and standard deviations for both categorical and continuous variables. This lecture builds the foundation for understanding data distributions before moving to bivariate analysis and statistical testing.
In this lecture, you will learn how to summarize and present categorical variables using counts (n) and percentages (%) without applying sampling weights. You will practice formatting variables such as sex, education, residence, wealth index, and nutritional status into clear and interpretable tables, which are commonly used in public health research, theses, and journal articles.
This lecture explains how to summarize continuous variables using mean ± standard deviation (SD) without applying sampling weights. You will learn how to format variables such as age, BMI, and z-scores (HAZ, WAZ, WHZ) into clear, publication-ready tables commonly used in epidemiology and biostatistics research.
This lecture explains how to set and control decimal places for percentages, means, and standard deviations in unweighted analysis. You will learn how to present results consistently (e.g., 1 or 2 decimal places) to improve clarity, readability, and compliance with academic and journal reporting standards.
This lecture explains how to perform bivariate analysis using column percentages without applying sampling weights. You will learn how to create cross-tabulation tables and interpret column-wise percentages to compare outcomes across exposure groups, a common approach in public health and epidemiology research.
This lecture explains how to add an overall (total) column or row to bivariate analysis tables. You will learn how to present overall frequencies and percentages alongside group-specific column percentages, improving interpretation and making tables more informative for reports, theses, and journal articles.
This lecture explains how to add p-values to bivariate analysis tables using appropriate statistical tests. You will learn when to apply the t-test for continuous variables and the Chi-square test for categorical variables, all without adjusting for sampling weights. The lecture also covers how to interpret p-values and present them correctly in academic tables.
This lecture demonstrates how to export descriptive and bivariate analysis tables as publication-ready Microsoft Word (DOCX) files. You will learn how to format tables with proper alignment, decimal places, overall columns, and p-values so they meet journal, thesis, and report submission standards, all based on unweighted analysis.
Learn how to define survey weights, primary sampling units (PSU), and strata to create a valid survey design object for analysis.
Compute weighted frequencies, percentages, means, and standard errors to summarize individual variables using survey weights.
This lecture explains how to validate survey-weighted results by comparing them with published reports such as DHS, NFHS, and BDHS final reports. You will learn how to cross-check key indicators, percentages, and trends, identify acceptable differences, and interpret discrepancies caused by weighting, sub-population selection, or variable definitions.
This lecture demonstrates how to format survey-weighted tables using n (%) for categorical variables and mean ± standard deviation (SD) for continuous variables. You will learn how to present results clearly and consistently, following common reporting standards used in DHS/NFHS reports, theses, and peer-reviewed journals.
This lecture introduces bivariate analysis using survey-weighted methods (svy). You will learn how to examine associations between outcomes and explanatory variables while accounting for sampling weights, clusters (PSU), and strata. The lecture covers weighted cross-tabulations, column percentages, and correct interpretation of results for complex survey data such as DHS, NFHS, and BDHS.
This lecture explains how to perform the survey-weighted Chi-square test (Rao–Scott adjusted Chi-square) to assess associations between categorical variables. You will learn how to obtain design-adjusted p-values, interpret results correctly, and report them in tables when working with complex survey data such as DHS, NFHS, and BDHS.
This lecture demonstrates how to export survey-weighted descriptive and bivariate analysis tables to Microsoft Word (DOCX) format. You will learn how to produce publication-ready tables with n (%), mean ± SD, overall columns, and survey-adjusted p-values, formatted correctly for theses, reports, and peer-reviewed journals using complex survey data.
Create simple bar charts to display the distribution of a categorical variable such as sex, residence, education, or wealth index.
This lecture demonstrates how to create a bar diagram for a single categorical variable using ggplot2 and then modify and label the chart for clarity and presentation. You will learn how to adjust bar width, colors, axis labels, titles, and add value labels (counts or percentages) to produce clean, publication-ready visualizations.
This lecture explains how to create bar diagrams for two categorical variables using grouped (side-by-side) bar charts in ggplot2. You will learn how to compare distributions across groups (e.g., outcome by residence or education), apply clear labeling, legends, and basic customization to produce interpretable and presentation-ready visualizations.
This lecture demonstrates how to create a bar diagram for a binary variable (e.g., Yes/No, 0/1) grouped by a single categorical variable using ggplot2 in RStudio. You will learn how to display group-wise percentages, add clear labels and legends, and customize the chart to clearly compare outcomes across categories such as residence, education, or wealth index.
This lecture explains how to create bar diagrams for a binary outcome variable (e.g., Yes/No, 0/1) stratified by two categorical variables using ggplot2 in RStudio. You will learn how to visualize group-wise percentages, use grouped or faceted bar charts, apply clear labels and legends, and customize the plot to compare outcomes across multiple categories such as residence and wealth index or education and region.
This lecture shows how to remove the background from bar diagrams in ggplot2 to create clean, professional-looking plots. You will learn how to use minimalist themes, remove grid lines and panel backgrounds, and adjust text and axis elements so figures are suitable for MS Word, PowerPoint, and journal publication.
This lecture demonstrates how to create bar diagrams for multiple binary variables (e.g., stunting, wasting, underweight, ANC ≥4) using ggplot2 in RStudio. You will learn how to reshape data, calculate percentages, and display multiple binary outcomes in a single or faceted bar chart with clear labels and a clean layout suitable for reports, theses, and publications.
This lecture demonstrates how to export figures (bar charts, box plots, regression plots) from RStudio into a designated folder on your computer. You will learn how to set the working directory, choose appropriate file formats (PNG, JPEG, PDF), control resolution (DPI), and ensure figures are publication-ready for use in Microsoft Word, PowerPoint, and academic journals.
Create a basic box plot for a single continuous variable such as age, BMI, or z-scores (HAZ, WAZ, WHZ) using ggplot2.
This lecture demonstrates how to create a box plot for a single continuous variable (such as age, BMI, or z-scores) using ggplot2 in RStudio, with a strong focus on proper labeling and customization. You will learn how to add clear axis labels, titles, units, and optional annotations to improve interpretability and make figures suitable for reports, theses, and journal publications.
This lecture explains how to create a box plot for a continuous variable (e.g., age, BMI, HAZ) grouped by one categorical variable (such as sex, residence, education, or wealth index) using ggplot2 in RStudio. You will learn how to compare medians, variability, and outliers across groups, apply clear labels and themes, and produce publication-ready visualizations.
This lecture demonstrates how to create a box plot for a single continuous variable (e.g., age, BMI, HAZ, WAZ) stratified by two categorical variables using ggplot2 in RStudio. You will learn how to use grouped and faceted box plots to compare distributions across multiple groups, interpret differences in medians and variability, and apply clear labeling and themes to produce publication-ready visualizations.
This lecture demonstrates how to modify and label box plots to create publication-ready figures using ggplot2 in RStudio. You will learn how to adjust axis titles, units, font sizes, colors, themes, and remove unnecessary elements to meet journal, thesis, and report formatting standards. The lecture also covers best practices for clarity, consistency, and professional presentation.
This lecture demonstrates how to create and interpret density plots in R to visualize the distribution of continuous variables and compare patterns across groups.
This lecture demonstrates how to create histograms using after_stat() in ggplot2 to correctly scale counts, density, and proportions for data distribution analysis.
This lecture demonstrates how to overlay a density plot on a histogram to visualize data distribution more clearly and compare observed patterns with a smooth density curve.
This lecture demonstrates how to overlay a normal distribution curve on a histogram to assess data normality and understand the distribution of continuous variables in R.
This lecture explains how to identify an appropriate binary outcome variable (e.g., stunting yes/no, disease status yes/no), define it clearly, and code it correctly (0/1) for logistic regression in RStudio. You will also learn how a well-defined outcome aligns with your research objective and ensures valid interpretation of odds ratios.
This lecture introduces simple (univariate) logistic regression, where one explanatory variable is analyzed at a time against a binary outcome. You will learn how to fit the model in RStudio, estimate unadjusted odds ratios (ORs) with 95% confidence intervals, and present results using the gtsummary package. This approach is commonly used for screening variables before multiple logistic regression.
This lecture introduces multiple (multivariable) logistic regression, where several exposure and confounding variables are included in a single model. You will learn how to fit the model in RStudio, interpret adjusted odds ratios (AORs) with 95% confidence intervals, and present results using the gtsummary package. This approach is essential for controlling confounding in epidemiology and public health research.
This lecture explains how to choose and change the reference category for categorical independent variables (e.g., education, wealth index, residence) in logistic regression models. You will learn how reference categories affect odds ratio interpretation, and how to update them in RStudio so results are clearly presented in gtsummary tables for theses and journal articles.
This recap lecture reviews key steps in modifying outcome and independent variables for logistic regression, including recoding categories, setting reference groups, labeling variables, and handling missing values. The session reinforces best practices to ensure correct model estimation and clear interpretation of results in RStudio using the gtsummary package.
This lecture demonstrates how to format unadjusted logistic regression results into a publication-ready table using the gtsummary package in RStudio. You will learn how to display ORs, 95% confidence intervals, and p-values, apply proper labels, and present results in a format commonly required for journal articles, theses, and dissertations.
This lecture demonstrates how to format results from multiple logistic regression into a publication-ready table using gtsummary in RStudio. You will learn how to display AORs with 95% confidence intervals and p-values, apply clear variable labels and reference categories, and prepare tables suitable for journal articles, theses, and dissertations.
This lecture demonstrates how to combine unadjusted and adjusted logistic regression results into one clean, journal-ready table using gtsummary in RStudio. You will learn how to align ORs and AORs side-by-side, include 95% confidence intervals and p-values, add appropriate footnotes for covariate adjustment, and prepare the final table output suitable for manuscripts, theses, and dissertations.
This lecture demonstrates how to export publication-ready tables (including unadjusted and adjusted odds ratios) from RStudio using the gtsummary package into MS Word format. You will learn how to preserve table formatting, alignment, decimal places, confidence intervals, and footnotes so tables are ready for journal submission, theses, and dissertations.
This final lecture applies logistic regression methods to examine determinants of child undernutrition—stunting, underweight, and wasting. You will learn how to run separate models for each outcome, interpret unadjusted and adjusted odds ratios, and present results in clear, publication-ready tables using gtsummary in RStudio. The lecture emphasizes correct interpretation, comparison across outcomes, and table formats commonly used in peer-reviewed journals, theses, and policy reports.
This course provides a practical, step-by-step guide to Demographic and Health Survey (DHS) data analysis using R and RStudio for public health, epidemiology, and health research. It is designed for MSc, MPH, PhD students, researchers, and data analysts who want to analyze survey data and produce publication-ready tables and figures.
You will learn how to work with DHS/NFHS-type survey datasets using RStudio, starting from data preparation and variable modification to descriptive, univariate, and bivariate analysis. The course covers both unweighted and survey-weighted (svy) analysis, including correct handling of sampling weights, clusters, and strata.
Key topics include descriptive statistics, chi-square tests, t-tests, bar diagrams, box plots, and logistic regression for binary outcomes such as stunting, underweight, and wasting. You will learn how to estimate unadjusted and adjusted odds ratios, change reference categories, interpret results correctly, and generate publication-ready tables using the gtsummary package.
The course also emphasizes data visualization with ggplot2, showing how to create clean, professional graphs suitable for theses, reports, and journal articles. You will learn how to export tables and figures to Microsoft Word while preserving formatting.
This is a hands-on, applied course, using real DHS-style data examples from countries such as Bangladesh, India, Nepal, Ethiopia, Nigeria, Kenya, and Tanzania. Advanced topics such as GEE, multilevel models, and longitudinal analysis will be added progressively.
By the end of this course, you will be confident in analyzing survey data in RStudio and producing results ready for academic publication and policy research.