Reproducible Analytical Pipelines (RAP) using R

Automating the production of statistical reports using DataOps principles.
Free tutorial
Rating: 3.7 out of 5 (82 ratings)
7,332 students
Reproducible Analytical Pipelines (RAP) using R
Free tutorial
Rating: 3.7 out of 5 (82 ratings)
7,332 students
Use DevOps to improve the production time and quality of your statistical reports.
Build a reproducible analytical pipeline enshrining business knowledge in an R package.
Automate the production of a periodic report.

Requirements

  • You should be familiar with R and the RStudio Integrated Development Environment.
  • You should be familiar with git and Github.
  • You should be familiar with writing functions in R.

Description

At the end of my course, students will be able to identify suitable Reproducible Analytical Pipelines (RAP) opportunities in their organisation. From their chosen report they will derive the minimal tidy data set required to produce all the figures, tables and statistics therein. They will confidently use basic git functionality for version control, providing an audit trail of their progress. They will collaborate on Github using a standard workflow relying on pull requests for peer review; ensuring quality assurance throughout the project. They will build an R package, providing a single corpus to enshrine and encapsulate the business knowledge. The package will have all the hallmarks of reproducibility and quality assurance through the students’ prudent application of Open Source software development tools and principles including: functional programming, unit testing, continuous integration and dependency management. The outcome will be a software package that facilitates an improved production time of the statistical report while improving the quality of the statistics. This will free up the student's time to do more interesting things.

DISCLAIMER: The views and opinions expressed in this course are those of the author and do not reflect the official policy or position of GDS or the UK Government. 

Who this course is for:

  • Anyone who has produced the same report or publication more than once.
  • Anyone who is frustrated and bored of manually processing data.
  • Anyone keen to automate their workflow for the regular analysis of the same kind of data input.

Course content

10 sections • 42 lectures • 6h 53m total length
  • Introduction to Reproducible Analytical Pipelines (RAP)
    02:53
  • Why RAP? + ACTIVITY
    06:59
  • Identifying RAP battles in your organisation
    6 questions
  • RAP is Open Source + ACTIVITY
    05:15
  • Open Source Trivia
    3 questions
  • Evaluating RAP + ACTIVITY
    01:00
  • RAP business benefit
    4 questions
  • Finding a RAP buddy + ACTIVITY
    08:25
  • Finding a buddy to collaborate with
    2 questions

Instructor

Data Scientist
Matthew Gregory
  • 3.7 Instructor Rating
  • 82 Reviews
  • 7,332 Students
  • 1 Course

A Data Scientist in the Public Sector working at the Government Digital Service (GDS). I am dedicated to building data science capability across Government and society; building capability is transformation.

Prior to life as a Civil Servant I completed a PhD in genetic engineering at Oxford and qualified as a teacher on the TeachFirst graduate scheme.

Disclaimer: my views and opinions are my own and do not reflect Government or GDS policy.