
Welcome to this course, introducing the fundamental of biostatistics using the python programming language.
In this first video, I introduce myself. My name is Juan and I am a Specialist Surgeon and a Researcher at the University of Cape Town. I run the Klopper Research Group. One of our main aims is to empower you to do your own statistical analysis.
This course will get your foot in the door. Python is an extremely easy and friendly tool for data analysis. It has become the standard in its field. Not only is it free of charge, it will serve you well for the rest of your career.
This course assumes a basic knowledge of statistics and data analysis. The kind that you will have after an introductory course in statistics or after having read many journal articles.
The aim then is to show you how easy it is to import data, describe and visualize the data, and how to do the most common statistical tests such as Student's t test, analysis of variance (ANOVA), the chi-squared test for independence, and many more.
The software that you will need is freely downloadable and is available for Windows, MacOS, and Linux.
You owe this to yourself. In no time at all, you will be doing your own data analysis.
I want you to be successful. Not only in finishing this course, but also in your career as a researcher. In this video I talk about the only way to achieve this goal. Sit down at your computer and do it. Practice doing your own analysis. Watch how I do it, but then go out and write you own code. Generate your own beautiful graphs and answer research question by doing the statistical tests yourself!
A quick search of the internet will show you just how popular Python is. It is free, easy to learn, and can do so many things. Apart from being the standard tool for data analysts, you can use it to write fully-fledged programs, create games, design website, and so much more.
The code for doing all of this is the same. So investing a small amount of time to learn how to do biostatistics using Python, you will be well equipped to do so much more should you ever wish too.
There really is no excuse not to learn how to program. Python makes this both fun and easy. Don't be mistaken, though, it is a serious and powerful tool. Once you know how to use it, it will be your partner for the rest of your research career.
Introduction to installing Python.
This video shows you how to find a python distribution online and how to install it.
This video showcases the Jupyter notebook. It is the coding environment that is used in this course.
Jupyter notebooks allow for the creation of documents similar to Word documents, which includes heading, formatted text, and code.
Jupyter notebooks run inside of your default browser.
This video will show you how to save a notebook and how to shut down Jupyter.
Instead of installing Python and Jupyter notebooks on your local machine, you can use the Google Colaboratory. Simply use your Google account and log on to your Google Drive. From here you can open a new file, but instead of choosing a Google Doc or a Google Sheet, scroll down to where it says More and then choose Colaboratory. Google's version of a Jupyter notebook will open up.
This video will show you some of the important differences between a Google Colaboratory notebook and a Jupyter notebook.
This video shows a completed notebook. It was created as a Google Colaboratory notebook and is the same as the one used in the last section of this course, showcasing a research project demonstration from start to finish.
The aim is to show you how a mix of code and text cells can create a beautiful, functional research document.
This video will help you take your first steps in writing Python code. If you have ever used a handheld calculator or used the calculator app on your phone, you will have no problem mastering this content. Learning how to do calculations helps build an intuitive understanding of programming.
Introduction to selections.
In this video you will learn how to create lists of items such as number and how to store them in a computer variable for later use. This section on collections will be of much help later when you start to analyze data as the actual values are usually stored in lists or other collections.
In this video you will learn how to create collections of values that follow a pattern. This is good knowledge for when you wish to create your own simulated data to work with (as we will do later in the course).
In this video you will learn how to store values as dictionaries. Dictionaries are unique in that every element in the dictionary can be given as name (key). Each element then becomes a key-value pair. It becomes easy to extract the actual data using the keys as access. This will be of great use later as we create plots and graphs.
Introduction to working with data.
This video will help you understand external data such as a spreadsheet. You will learn how to import this common container of data into a Python environment sing the Pandas library.
This video will show you how to access and explore the data that you have imported.
The next two video lectures will help you generate your own data to use as you learn.
This video will show you how to create random data.
This video will show you how to create a DataFrame and populate it with random data.
This video shows how to identify common errors in datasets and how to correct them using Pandas..
This video shows more techniques of finding errors in datasets and how to clean them using Pandas.
Introduction to descriptive statistics.
Descriptive statistics is the first step in biostatistics. It is your first exploratory step in trying to understand the data. It takes rows and rows upon columns and columns of data and expresses it in a way that we can understand.
In this video you will learn how to condense the information in a set of data point values by expressing a value that is representative of the set. Now that is a very fancy way of talking about the mean, median, and mode of a set of data point values.
This video will show you how to calculate values that express the spread in your data. These statistics include the range,variance, standard deviation, and quartiles.
Introduction to data visualization. After descriptive statistics, the ability to visualize data in the form of plots and graphs, is the most powerful way of learning what the data is trying to tell you.
This course shows you how to use the Plotly library in Python. It creates visually stunning plots, ready for publication. Plotly has the added benefit that the plots are interactive. You can upload it to a web site where the plots can actually be manipulated. Plotly graphs and plots give your biostatistical analysis the edge.
This video will introduce you to the world of data visualization through one of the easiest plots to create and understand, the scatter plot. A simple scatter plot takes pairs of numerical values from a single subject and plots them on a xy coordinate plot.
This video continues the journey into scatter plots. It shows you how to add a third and even a fourth variable to a two-dimensional plot.
This video will show you how to create box-and-whisker plots. They are a mainstay of biostatistics and help us understand the spread of a numerical variable.
This video will show you how to create histograms. These plots are ideal to show the distribution of numerical data point values.
This video will help you to create dot plots.
This video will help you to create bar charts. These plots are ideal to show the count of categorical variables.
Plotly Express is a module similar to the Graph Objects module that we have been working with. Plotly Express has two main advantages. It is very simple and quick to use, ideal for the initial exploration of your data. It is very powerful, though, with all of the plotting functions having multiple arguments which can be used to manipulate the plots as required. Plotly Express also works very well with pandas dataframes, with direct access to the columns of a dataframe.
In part 1 of this section on Plotly Express, we get familiar with the syntax and reference pandas dataframes. You can download the notebook, CSV file, and the cascading style sheet under resources.
In part 2 we continue our look into the commonly used plots in Plotly Express. Remember that you can find the files under the resources section.
Introduction to this section on the testing of the assumptions for the use of parametric tests. Although parametric tests such as Student's t test, analysis of variance, and correlation is ubiquitous in biostatistics, they actually make some assumptions of the data. Fortunately, there are ways to tests if these assumptions are met. This will ensure that your analysis is an accurate reflection of your research.
This video will introduce you to the assumptions that must be met before using common parametric tests.
This video will show you which statistical tests you can use to test for the assumption of normality.
This video will show you to make sure that the variances of the different groups that you are comparing are not too dissimilar.
This video show you how to look for outliers in your data.
Introduction to the correlation between numerical variables.
This video will show you how to test for the dependence of one variable to another.
This video will show you how to test the relationship of more than one variable with a single outcome variable.
These are the most common tests used in biostatistics. From a humble start in a brewing company, the William Gosset Student's t test has conquered all.
This video will show you how to use common t tests such as Student's t test. It also takes a look at one-way analysis of variance.
This course empowers you to do your own biostatistical analysis. Whether you are a healthcare professional, scientist, or just someone interested in supercharging their research career, the time to learn how to use a modern computer language to do you own analysis, has arrived.
Python is becoming the de facto standard in data analysis. It is a free to use, powerful programming language. With the minimum of effort, you will soon be able to do all you own analysis, create beautiful plots, and deliver your reports or publish your research with confidence and pride.