
Learn foundational statistics for data science by exploring graphical representations, performing statistical calculations, and identifying measures of central tendency and dispersion to work with data.
Define statistics as the collection, organization, analysis, presentation, and interpretation of data for a predetermined purpose. Explain raw data, data from a sample, and contrast discrete data with continuous data.
Explore frequency distribution concepts, including grouped frequency distribution, and learn to construct a distribution table from a given dataset.
Create an ungrouped frequency distribution table from discrete data, using tally and frequency columns to show how often each score occurs, then discuss grouping for large ranges.
Learn to determine class limits and class boundaries for grouped data, using 0.5 adjustments to set lower and upper bounds, compute class width, and distinguish equal versus unequal intervals.
Calculate the class mark, the center of a class interval, by averaging endpoints, and apply class marks to build and interpret a grouped frequency distribution.
build a grouped frequency distribution using class marks instead of class intervals, recording tally and frequency to complete the table, and explore equal and equal-plus intervals for data grouping.
Construct a cumulative frequency distribution using thresholds 9.5, 14.5, 19.5, 24.5, 29.5, with the final entry equal to the total sample, and create the graphical representation.
Explore graphical representations of data, starting with pictographs and progressing to the cumulative frequency curve, also called a gif.
Explore pictographs, a pictorial data representation using pictures to convey information clearly alongside frequency tables. Learn to choose symbols, such as a 1000-hour stick-figure scale, while noting misleading effects.
Learn how a bar chart uses equal-width bars with heights proportional to data, plotting frequency on the vertical axis and items on the horizontal axis, as Coca-Cola reaches five.
Explore compound budgets with a stacked bar chart representing male, female, and clergy students across four years. Learn how to draw axes, scale totals, and interpret combined budgets.
Explore how pie charts use circular sectors with angles proportional to data magnitudes, converting values into degrees that sum to 360 for visualization.
Calculate pie chart sector angles by converting category amounts into percentages and then into degrees, using rent, food, transport, and savings as examples.
Learn to create a pie chart in Excel from a table of expenditures (food, rent, transport, clothes, savings), insert the chart, and display the data as percentages representing angles.
A line graph visualizes data that changes over time, allowing us to see trends in monthly earnings from January to December using Excel formatting options.
Explore how histograms display frequency distributions, where the area of each bar, not its height, is proportional to frequency, and learn about frequency density for unequal class intervals.
Plot a histogram for grouped data using a shoe-size distribution from 39 to 44 and their frequencies, illustrating how to represent frequency on the horizontal axis and build the histogram.
Explore plotting a frequency histogram for unequal class intervals by calculating frequency density and class width from height data. Understand class boundaries and class marks in grouped data.
Learn to build a frequency polygon from a histogram by plotting class midpoints (class marks) and connecting them with straight lines to reveal the data's distribution.
Learn how to construct and interpret the cumulative frequency curve (ogive) from a frequency distribution, including creating the cumulative table, using upper class boundaries, and plotting the curve.
Explore measures of central tendency by computing the arithmetic mean for two groups, then examine group data through median, deciles, quartiles, percentiles, and interquartile range, plus the cumulative frequency distribution.
Explore central tendency as a single representative value for a data set using the arithmetic mean, median, and mode. Note outliers and include the geometric mean as another measure.
Explore the arithmetic mean as a measure of central tendency, using all data points to compute the average, with the mean denoted by x-bar and sensitive to extreme values.
Learn to compute the mean from an assumed mean using grouped data and frequencies. Apply the a plus sum f_i d_i over sum f_i formula to simplify mean calculations.
Explore four formulas for calculating the mean and work through a data set example. Compute the mean by summing the data and dividing by the count to obtain 5.25.
Learn how to use sigma notation to define the mean, perform summations, and compute the simple or frequency-weighted mean by dividing sums by total frequencies.
Learn to compute the mean from a frequency table by summing x times f and dividing by the total frequency, with a worked example.
Define the median as the middle value of ordered numbers; for odd sets, it's the middle value, while for even sets, it's the average of the two middle numbers.
Arrange data in order. For odd data, use middle value; for even data, average the two middle values to get median, as shown with 11 numbers and 104 and 107.
Learn methods to determine the median: geometrically by dividing data with a vertical line, and via a cumulative frequency curve using a parallel line and a perpendicular to the axis.
Estimate the median from a histogram and a cumulative frequency curve for a grouped distribution with unequal class intervals, using frequency density and class mark.
Explore the basics of quartiles and percentiles, defining q1, q2 (the median), and q3, and connect these to p25, p50, and p75 on the cumulative frequency curve.
Explore estimating the median from grouped data with the interpolation formula, using the histogram and core terms—lower boundary, cumulative frequency before the median class, median class frequency, and class width.
Construct a grouped frequency distribution, identify the median class from cumulative frequencies, and estimate the median using interpolation with lower and upper boundaries, class width, N/2, FC, f_m, and L.
Identify the mode as the value with the highest frequency in a distribution. Estimate the mode for grouped frequency distributions using the geometric method with a histogram.
Construct a frequency distribution table to identify the mode as the value with highest frequency. See 13 as the mode in example and link frequency to mean and standard deviation.
Estimate the mode for grouped data using the interpolation formula, identify the mode as the class with highest frequency, and use its lower boundary, adjacent frequencies, and middle class width.
Estimate the mode from a grouped frequency distribution using the interpolation formula, identifying the modal class and its boundaries to compute a near value, about 75.2 kg.
Explore the mean, median, and mode as central-tendency measures, highlighting the mean's advantages, precise definition, outlier sensitivity, and issues with missing data.
Compare the median's advantages and disadvantages, highlighting its computability with missing data and its inability to account for extreme values, and the requirement to arrange data before calculation.
Identify the mode’s advantages, such as easy determination by inspection and insensitivity to outliers, and note its disadvantages like non-uniqueness and poorer representation than the mean or median.
Explore measures of dispersion and the quotient of variation, and learn how to choose the right measure in different situations. Examine the range, types of deviations, quartiles, and percentiles.
Explore measures of dispersion and the coefficient of variation, including quartiles, interquartile ranges, semi interquartile ranges, and the mean deviation and standard deviation of grouped distributions, derived from octave.
Explore how data spread relates to central tendency by examining measures of dispersion, including range, interquartile range, variance, and standard deviation.
Calculate the range as the highest minus the lowest value. Reveal that two datasets may share the same range but differ in dispersion, showing range is not an efficient measure.
Explore mean deviation, the dispersion measure defined by the sum of absolute deviations from the mean, and apply it to frequency distributions and grouped data with the associated formulas.
Compute the mean and the mean deviation (emd) for ungrouped data using absolute deviations from the mean. Demonstrate with data 3,5,8,11,12,21, yielding a mean of 10 and emd of 4.67.
Compute the mean from a grouped frequency distribution by listing ages, class marks, and frequencies, then determine the mean deviation using |x minus mean| and frequency weights.
Compute the interquartile range and semi interquartile range by sorting data in ascending order, identify Q one and Q three, then derive the two measures, with a worked example.
Construct a cumulative frequency table for a grouped distribution and determine Q1, Q3, and the interquartile range. Compute the semi interquartile range and apply it to a coffee cups example.
Analyze the disadvantages of the range, mean deviation, and interquartile range as dispersion measures, contrasting their limitations with standard deviation and variance.
Explain dispersion by defining variance as the square of the standard deviation, with the standard deviation denoted by s, and discuss grouped and ungrouped data calculations and standardization.
Compute the mean from the frequency distribution of shoe sizes and then determine the standard deviation for grouped data using the f x and sum f.
Learn to compute standard deviation and variance for group frequency distributions using class marks and an assumed mean, applying two formulas for standard deviation.
Learn how to compute the mean using an assumed mean and the standard deviation for a grouped distribution of donor masses, using class marks and frequencies.
Demonstrates calculating standard deviation for a grouped distribution using the second formula, including class marks, class boundaries, and frequencies to compute the mean and dispersion.
Explore the coefficient of variation, defined as the standard deviation divided by the mean, and see its calculation on a small data set to compare dispersion.
Explore a full example of measure of dispersion using a grouped frequency distribution to compute the mean, variance, standard deviation, and the coefficient of variation.
Finish this course and gain a first hand look at statistics for data science. Prepare to explore classes and use Google searches to find material for your data science career.
This course will take you from basics of Statistics to more high level view of Statistical computations. This course is for you if you are just getting started with Data Science or Machine Learning and you need to understand the nitty-gritty of all the statistical calculations being used in these fields.
We will start with the big picture of what Statistics is, graphical representations of Data, the measures of Central Tendency and Measures of Dispersion and lastly the coefficient of variations.
Note that this course will not be talking about descriptive statistics though some use cases of the concepts will be discussed where necessary.
In this course I will be taking you through all the nitty-gritty and foundational concepts of statistics that you will need to establish a career in data science and machine learning. This course will prepare you with statistical calculations, graphical representations of data and how to make meaning of these graphs.
At the end of this course, you will know how to represent data graphically in different forms, how to determine the measures of central tendency and dispersions for a giving set of data and how to know which operation is to be performed when giving some set of data.
This is the right course for you if you are new to data science and you want to understand the principles behind different formulas or calculations you will come across in your data science journey.