
The video presentation for this lecture kept failing thus we provide a transcript here.
This lecture introduces our chosen RAP battle.
We make up a toy example for demonstrative purposes, keeping things simple. We start off with the basic RAP principles and then iterate, adding some of the more complicated features later.
Selecting our own hypothetical report also ensures that we capture the essence of the task while avoiding some of the complexity. It also ensures that everyone can access the underlying data.
Registers are lists of information. Each register is the most reliable list of its kind. For example, the Foreign and Commonwealth Office’s (FCO’s) country register is the most accurate and up-to-date list of countries available.
Each register is looked after by one person, known as the ‘custodian’. The custodian is from the organisation responsible for the information in the register. They make sure the register is kept up to date.
For our RAP demo we imagine we work within the registers team and are need to produce a periodic report on how many registers they are at any time.
Fortunately, there is a register for that! The register register. How meta.
This has enough variables where we could produce a simple report summarising it. Also the data is available in a variety of formats or even as an API. This will allow us to practice interacting with different imagined “data store” scenarios. I.e. what format do you get your data in? Often we can’t control this in a real setting.
To keep things real in your mind we sketch out an imagined report so you know what we are working towards.
We did this as a Google doc which will be available in the resources for this lecture.
We look at it now, the details aren't important, the overall structure, the figures, statistics and tables are, as this reflects the number of functions we will need to produce for our RAP package.
Review the register register rap demo report in the resources and think about how what data you would need and how many functions you would need to produce the report? (what would need updating periodically, what would change as new data comes in?)
We have sketched out the RAP demo for this course. However, now is a good time to reflect on your own RAP battle.
What will your output look like? Do you need to sketch it out like we did or does it already exist?
Perhaps you could chunk it down as one page of your report, or all the figures or just one chapter?
RAP is a continuum, not all or nothing; try implementing it at a manageable scale initially as you learn a bunch of new techniques.
At the end of my course, students will be able to identify suitable Reproducible Analytical Pipelines (RAP) opportunities in their organisation. From their chosen report they will derive the minimal tidy data set required to produce all the figures, tables and statistics therein. They will confidently use basic git functionality for version control, providing an audit trail of their progress. They will collaborate on Github using a standard workflow relying on pull requests for peer review; ensuring quality assurance throughout the project. They will build an R package, providing a single corpus to enshrine and encapsulate the business knowledge. The package will have all the hallmarks of reproducibility and quality assurance through the students’ prudent application of Open Source software development tools and principles including: functional programming, unit testing, continuous integration and dependency management. The outcome will be a software package that facilitates an improved production time of the statistical report while improving the quality of the statistics. This will free up the student's time to do more interesting things.
DISCLAIMER: The views and opinions expressed in this course are those of the author and do not reflect the official policy or position of GDS or the UK Government.