Data Science Statistics for Absolute Beginners

Name: Data Science Statistics for Absolute Beginners
Rating: 4.4 (6 reviews)

Beginners approach to technicalities of Statistics for Data Science

Created bySamuel Adesola

Last updated 11/2022

English

What you'll learn

Students will be have full knowledge of the core statistics needed for Data Science
Students will be able to decide and construct different visualizations and graphical representations used in Statistics
Identify and be able to carry out calculations related to calculating Measures of Central Tendency
Identify and be able to carry out calculations related to calculating Measures of Dispersion
Define the right operations to be performed on a set of data and carry out the mathematics behind those operations

Course content

10 sections • 71 lectures • 4h 57m total length

Welcome to Statistics for Data Science1:29
Learn foundational statistics for data science by exploring graphical representations, performing statistical calculations, and identifying measures of central tendency and dispersion to work with data.
The Subject of Statistics1:39
Define statistics as the collection, organization, analysis, presentation, and interpretation of data for a predetermined purpose. Explain raw data, data from a sample, and contrast discrete data with continuous data.
Section Quiz

Section Introduction0:26
Explore frequency distribution concepts, including grouped frequency distribution, and learn to construct a distribution table from a given dataset.
Ungrouped Frequency Distribution Table5:47
Create an ungrouped frequency distribution table from discrete data, using tally and frequency columns to show how often each score occurs, then discuss grouping for large ranges.
Grouped Frequency Distribution Class Range, Interval and Width4:33
Grouped Frequency Distribution Class Limit and Class boundary5:18
Learn to determine class limits and class boundaries for grouped data, using 0.5 adjustments to set lower and upper bounds, compute class width, and distinguish equal versus unequal intervals.
Grouped Frequency Distribution Class mark2:27
Calculate the class mark, the center of a class interval, by averaging endpoints, and apply class marks to build and interpret a grouped frequency distribution.
Grouped Frequency Distribution Table3:29
build a grouped frequency distribution using class marks instead of class intervals, recording tally and frequency to complete the table, and explore equal and equal-plus intervals for data grouping.
Section Quiz
Unequal Class Intervals6:16
Cumulative Frequency Distribution 14:05
Cumulative Frequency Distribution 24:43
Construct a cumulative frequency distribution using thresholds 9.5, 14.5, 19.5, 24.5, 29.5, with the final entry equal to the total sample, and create the graphical representation.
Frequency Distribution Practice Test

Section Introduction0:20
Explore graphical representations of data, starting with pictographs and progressing to the cumulative frequency curve, also called a gif.
Pictographs7:00
Explore pictographs, a pictorial data representation using pictures to convey information clearly alongside frequency tables. Learn to choose symbols, such as a 1000-hour stick-figure scale, while noting misleading effects.
Simple Bar Chart6:27
Learn how a bar chart uses equal-width bars with heights proportional to data, plotting frequency on the vertical axis and items on the horizontal axis, as Coca-Cola reaches five.
Compound Bar Chart7:00
Explore compound budgets with a stacked bar chart representing male, female, and clergy students across four years. Learn how to draw axes, scale totals, and interpret combined budgets.
Introduction to pie chart2:42
Explore how pie charts use circular sectors with angles proportional to data magnitudes, converting values into degrees that sum to 360 for visualization.
Section Quiz
Pie Chart Sector Calculations5:53
Calculate pie chart sector angles by converting category amounts into percentages and then into degrees, using rent, food, transport, and savings as examples.
Visualizing Pie Chart1:44
Learn to create a pie chart in Excel from a table of expenditures (food, rent, transport, clothes, savings), insert the chart, and display the data as percentages representing angles.
Line Graph3:11
A line graph visualizes data that changes over time, allowing us to see trends in monthly earnings from January to December using Excel formatting options.
The concept of Histogram3:45
Explore how histograms display frequency distributions, where the area of each bar, not its height, is proportional to frequency, and learn about frequency density for unequal class intervals.
Histogram Example 14:39
Plot a histogram for grouped data using a shoe-size distribution from 39 to 44 and their frequencies, illustrating how to represent frequency on the horizontal axis and build the histogram.
Histogram Example 211:35
Explore plotting a frequency histogram for unequal class intervals by calculating frequency density and class width from height data. Understand class boundaries and class marks in grouped data.
Frequency Polygon1:13
Learn to build a frequency polygon from a histogram by plotting class midpoints (class marks) and connecting them with straight lines to reveal the data's distribution.
The Cumulative Frequency Curve or Ogive6:37
Learn how to construct and interpret the cumulative frequency curve (ogive) from a frequency distribution, including creating the cumulative table, using upper class boundaries, and plotting the curve.
Section Test
Graphical Representation of Data Practice Test0:01

Section Introduction0:30
Central Tendency0:28
Explore measures of central tendency by computing the arithmetic mean for two groups, then examine group data through median, deciles, quartiles, percentiles, and interquartile range, plus the cumulative frequency distribution.
The concept of Central Tendency2:31
Explore central tendency as a single representative value for a data set using the arithmetic mean, median, and mode. Note outliers and include the geometric mean as another measure.

The Arithmetic Mean1:19
Explore the arithmetic mean as a measure of central tendency, using all data points to compute the average, with the mean denoted by x-bar and sensitive to extreme values.
Mean from Assumed Mean6:29
Learn to compute the mean from an assumed mean using grouped data and frequencies. Apply the a plus sum f_i d_i over sum f_i formula to simplify mean calculations.
An Example on calculating Mean 12:09
Explore four formulas for calculating the mean and work through a data set example. Compute the mean by summing the data and dividing by the count to obtain 5.25.
The Sigma Notation4:55
Learn how to use sigma notation to define the mean, perform summations, and compute the simple or frequency-weighted mean by dividing sums by total frequencies.
An Example on calculating Mean 22:13
Learn to compute the mean from a frequency table by summing x times f and dividing by the total frequency, with a worked example.
An Example on calculating Mean 36:38
Mean of Grouped Frequency Distribution8:14
Section Quiz

The Median2:13
Define the median as the middle value of ordered numbers; for odd sets, it's the middle value, while for even sets, it's the average of the two middle numbers.
An Example on calculating Median4:12
Arrange data in order. For odd data, use middle value; for even data, average the two middle values to get median, as shown with 11 numbers and 104 and 107.
Geometrical Determination of the median3:15
Learn methods to determine the median: geometrically by dividing data with a vertical line, and via a cumulative frequency curve using a parallel line and a perpendicular to the axis.
The Median Class0:47
Estimating the Median using Histogram and Cumulative Frequency Curve6:25
Estimate the median from a histogram and a cumulative frequency curve for a grouped distribution with unequal class intervals, using frequency density and class mark.
Introduction to Quartiles and Percentiles3:39
Explore the basics of quartiles and percentiles, defining q1, q2 (the median), and q3, and connect these to p25, p50, and p75 on the cumulative frequency curve.
Estimating the median using the interpolation formulae7:27
Explore estimating the median from grouped data with the interpolation formula, using the histogram and core terms—lower boundary, cumulative frequency before the median class, median class frequency, and class width.
Median Interpolation formulae example11:31
Construct a grouped frequency distribution, identify the median class from cumulative frequencies, and estimate the median using interpolation with lower and upper boundaries, class width, N/2, FC, f_m, and L.
Section Quiz

Introduction to The Mode2:13
Identify the mode as the value with the highest frequency in a distribution. Estimate the mode for grouped frequency distributions using the geometric method with a histogram.
Finding The Mode3:05
Construct a frequency distribution table to identify the mode as the value with highest frequency. See 13 as the mode in example and link frequency to mean and standard deviation.
Estimating the mode using the interpolation formulae3:41
Estimate the mode for grouped data using the interpolation formula, identify the mode as the class with highest frequency, and use its lower boundary, adjacent frequencies, and middle class width.
Mode interpolation formulae example5:01
Estimate the mode from a grouped frequency distribution using the interpolation formula, identifying the modal class and its boundaries to compute a near value, about 75.2 kg.
Section Quiz

Advantages and Disadvantages of the Mean4:28
Explore the mean, median, and mode as central-tendency measures, highlighting the mean's advantages, precise definition, outlier sensitivity, and issues with missing data.
Advantages and Disadvantages of the Median2:39
Compare the median's advantages and disadvantages, highlighting its computability with missing data and its inability to account for extreme values, and the requirement to arrange data before calculation.
Advantages and Disadvantages of the mode3:50
Identify the mode’s advantages, such as easy determination by inspection and insensitivity to outliers, and note its disadvantages like non-uniqueness and poorer representation than the mean or median.
Measures of Central Tendency Practice Exercise0:01

Section Introduction0:26
Explore measures of dispersion and the quotient of variation, and learn how to choose the right measure in different situations. Examine the range, types of deviations, quartiles, and percentiles.
Introduction to Measures of Dispersion0:42
Explore measures of dispersion and the coefficient of variation, including quartiles, interquartile ranges, semi interquartile ranges, and the mean deviation and standard deviation of grouped distributions, derived from octave.
The Concept of Measures of Dispersion4:27
Explore how data spread relates to central tendency by examining measures of dispersion, including range, interquartile range, variance, and standard deviation.
The Range3:25
Calculate the range as the highest minus the lowest value. Reveal that two datasets may share the same range but differ in dispersion, showing range is not an efficient measure.
Mean Deviation3:21
Explore mean deviation, the dispersion measure defined by the sum of absolute deviations from the mean, and apply it to frequency distributions and grouped data with the associated formulas.
Mean Deviation of Ungrouped Frequency Distribution3:16
Compute the mean and the mean deviation (emd) for ungrouped data using absolute deviations from the mean. Demonstrate with data 3,5,8,11,12,21, yielding a mean of 10 and emd of 4.67.
Mean Deviation of Grouped Frequency Distribution8:32
Compute the mean from a grouped frequency distribution by listing ages, class marks, and frequencies, then determine the mean deviation using |x minus mean| and frequency weights.
Section Quiz
Interquartile and The Semi Interquartile range4:49
Interquartile and The Semi Interquartile range Example 17:03
Compute the interquartile range and semi interquartile range by sorting data in ascending order, identify Q one and Q three, then derive the two measures, with a worked example.
Interquartile and The Semi Interquartile range of Grouped Distribution9:39
Construct a cumulative frequency table for a grouped distribution and determine Q1, Q3, and the interquartile range. Compute the semi interquartile range and apply it to a coffee cups example.
Disadvantages of the Range, Mean Deviation and Interquartile Range4:07
Analyze the disadvantages of the range, mean deviation, and interquartile range as dispersion measures, contrasting their limitations with standard deviation and variance.

Variance and Standard Deviation2:35
Explain dispersion by defining variance as the square of the standard deviation, with the standard deviation denoted by s, and discuss grouped and ungrouped data calculations and standardization.
Standard Deviation Example Part 17:09
Compute the mean from the frequency distribution of shoe sizes and then determine the standard deviation for grouped data using the f x and sum f.
Standard Deviation Example Part 24:49
Learn to compute standard deviation and variance for group frequency distributions using class marks and an assumed mean, applying two formulas for standard deviation.
Standard Deviation for Grouped Distribution Example 17:17
Learn how to compute the mean using an assumed mean and the standard deviation for a grouped distribution of donor masses, using class marks and frequencies.
Standard Deviation for Grouped Distribution Example 29:03
Demonstrates calculating standard deviation for a grouped distribution using the second formula, including class marks, class boundaries, and frequencies to compute the mean and dispersion.
Coefficient of Variation8:17
Explore the coefficient of variation, defined as the standard deviation divided by the mean, and see its calculation on a small data set to compare dispersion.
Full Example on Measure of Dispersion6:08
Explore a full example of measure of dispersion using a grouped frequency distribution to compute the mean, variance, standard deviation, and the coefficient of variation.
Measures of Dispersion Practice exercise0:01
Final Remark0:34
Finish this course and gain a first hand look at statistics for data science. Prepare to explore classes and use Google searches to find material for your data science career.

Requirements

This course is meant for absolute beginners in the field of Statistics and Data Science therefore No Prerequisites are required to be able to understand the concepts explained in this course.
This course entails various mathematical operations in Statistics and can be taken by both beginners and experts wishing to refresh their statistics knowledge.

Description

This course will take you from basics of Statistics to more high level view of Statistical computations. This course is for you if you are just getting started with Data Science or Machine Learning and you need to understand the nitty-gritty of all the statistical calculations being used in these fields.

We will start with the big picture of what Statistics is, graphical representations of Data, the measures of Central Tendency and Measures of Dispersion and lastly the coefficient of variations.

Note that this course will not be talking about descriptive statistics though some use cases of the concepts will be discussed where necessary.

In this course I will be taking you through all the nitty-gritty and foundational concepts of statistics that you will need to establish a career in data science and machine learning. This course will prepare you with statistical calculations, graphical representations of data and how to make meaning of these graphs.

At the end of this course, you will know how to represent data graphically in different forms, how to determine the measures of central tendency and dispersions for a giving set of data and how to know which operation is to be performed when giving some set of data.

This is the right course for you if you are new to data science and you want to understand the principles behind different formulas or calculations you will come across in your data science journey.

Who this course is for:

Beginners in the field of Data Science and Machine Learning wishing to know the core part of the Statistics behind some data science practices
Experts wishing to revise some foundations of statistics for Data Science and Machine Learning

Data Science Statistics for Absolute Beginners

What you'll learn

Explore related topics

Course content

Welcome2 lectures • 3min

Frequency Distribution Table9 lectures • 37min

Graphical Representations of Data14 lectures • 1hr 2min

Measures of Central Tendency3 lectures • 3min

The Mean7 lectures • 32min

The Median8 lectures • 39min

The Mode4 lectures • 14min

Pros and Cons of Mean, Median and Mode4 lectures • 11min

Measures of Dispersion: Basics11 lectures • 50min

Variance and Standard Deviation9 lectures • 46min

Requirements

Description

Who this course is for: