
Discover first principles in data science and machine learning, and learn the ideal student background, including math, coding experience, and industry-focused practicality.
Data science explains decisions under uncertainty by turning observable data into informed estimates of unseen factors. It uses exploration to drive hypotheses and improve outcomes, such as pricing a house.
Explore the data analyst role as a gateway to data science, delivering data products and real-time dashboards to decision makers using spreadsheets, SQL, and Tableau.
Explore what spreadsheets are—electronic grids of data used for calculations. The course centers on Microsoft Excel as the widely used, feature-rich business calculator, with context on Google Sheets and Numbers.
Explore how Excel uses operators to perform arithmetic, concatenation, and logical comparisons, and learn the order of operations and parentheses to build powerful formulas.
Learn to compute summary statistics in Excel from scratch and with built-in functions, including mean, variance, standard deviation, covariance, and correlation, while distinguishing population versus sample calculations.
Explore summary statistics on a data table in Excel, calculating mean, population vs. sample standard deviation, and correlations between income, bedrooms, and housing value, with 3D maps for visualization.
Analyze a California home price data set from 1990, computing summary statistics and correlations among numeric variables, with visualizations and guidance on publishing as web pages or Power BI.
Explore how to visualize Microsoft stock data in Excel using line and candlestick charts, create chart sheets, adjust formats, and compare chart types, noting data layout for meaningful visuals.
Discover how dashboards provide a single, focused view of analysis that delivers quick insights, using Excel to model data, design simple templates, and separate data analysis from visuals.
Explore how to create and analyze pivot tables in Excel to summarize data, compare Titanic survival by sex and class, and visualize findings with charts.
Explore SQL syntax for data science by learning comments, commands, keywords, identifiers, and literals, and practice crafting select statements that fetch all columns from a table with semicolon termination.
Explore SQLite, a free, open-source, local SQL database engine that is fast, self-contained, and cross-platform, with practical setup on Windows, Mac, and Linux.
Master core sql basics to query a single table using select, from, limit, and where. Understand selecting all columns and filtering results with operators.
Learn to use order by and distinct in basic SQL queries to sort employees by first name and salary, with ascending or descending order and multi-column sorting.
Explore intermediate SQL aggregate functions such as count, sum, average, min, max, and group by to compute department level salaries; learn aliases and rounding for clean results.
Explore how to insert, update, and delete data in a database, using insert into values, update set where, and delete from clauses on the dependence table.
Explore how to import CO2 emissions data from Kaggle into Tableau, create geospatial maps and time-series visuals, and build interactive dashboards with filters.
Explore Tableau filters and meta filters that update entire data sources, including continuous vs discrete filters, handling nulls, and customizing dropdown, slider, or checkbox lists for country selections.
Explore how to visualize distributions in Tableau using circle charts, average lines, and box plots to analyze Titanic survival by family size, class, and age.
Master Tableau dual axis visualizations by comparing high and low prices by date and overlaying stock price with volume, using date parts to reveal their relationship.
Learn to build Tableau dashboards and stories by joining GDP and CO2 emissions data, crafting maps and charts, configuring filters and actions, and previewing dashboards across devices.
Learn to save and publish to Tableau Public, sign in, share dashboards, and build a Tableau story for your project portfolio.
Explore the RStudio environment through a hands-on tour of the console, plots pane, and script editor. Learn to run code, manage objects, install packages like ggplot2, and use keyboard shortcuts.
Use R as a calculator for basic operations (addition, subtraction, multiplication, division, powers), with immediate interpreted output, and learn to write comments with # and to manage simple chained calculations.
Install R packages from cran or local mirrors, manage dependencies with install.packages, explore the tidyverse, and use c(), library, qplot, ggplot, rnorm, and runif to create and plot data.
Explore how factors transform categorical variables into numeric representations by wrapping a vector in a factor, revealing levels and integers that support regression, classification, and visualization.
Data frames are two-dimensional, tabular structures—like Excel tables—for storing heterogeneous data. Access columns with the dollar sign, flatten lists into frames, and use table counts in exploratory data analysis.
Explore exploratory data analysis on a Kaggle house price dataset using ggplot visualizations, summary statistics, and regression modeling to guide feature engineering and prediction.
Explore essential exploratory data analysis tools, from box plots and Tukey's quartiles to frequency tables, histograms, density plots, and scatter and correlation visuals, using the tidyverse.
Explore the tidyverse website for resources, documentation, and package installation guidance, including ggplot2, and bookmark cheat sheets and the data science roadmap repository.
Learn to use ggplot facets, including wrap and grid, to create subplots by survived and passenger class, and facet by two variables; coercing numeric to character or factor.
Explore how to use ggplot position adjustments (identity, dodge, and fill) to tell stories by coercing variables into factors, coloring and stacking survival by family size on the Titanic dataset.
Use dplyr filter to create explicit subsets, compose criteria with and/or and in operators, and handle missing values to examine pclass, age, and cabin.
Employ mutate to add new columns by computing values from existing data or external vectors, such as the absolute distance from the column mean, enhancing data frame flexibility.
Master the pipe syntax to chain data operations in the tidyverse, using group_by and summarize to compute counts and averages across subgroups like sex, passenger class, and family size.
Learn to handle dates and times in R using the lubridate package, converting strings to date objects, formatting dates, and visualizing time-based data with ggplot for exploratory data analysis.
Discover how markdown helps data scientists share Kaggle-style exploratory data analysis on the web, using headers, lists, links, images, and code blocks in RStudio and r markdown.
Explore notebooks in Kaggle as interactive computing environments that mix code and markdown, show outputs, and enable iterative analysis and easy sharing.
This is an ambitious course. The goal here is simple: Only teach what you need to know for day 1 of your first data science job. No fluff, nothing out of context, no topics that are not relevant to real world applications. We will cover EVERY core topic and tool required for those new to data science: Python, R, SQL, Useful Math/Stats/Algorithms, Tableau, and Excel in depth. The course will cover skills that align with three different job types:
- Data Analyst
- General Data Scientist
- Machine Learning Engineer
You can expect to learn from first principles the foundational topics and tools used in practice today. We will avoid topics that are not useful or are simply too advanced when starting out. Your journey will be guided by the Data Science Road Map, a collection of the best resources gathered through years of experience by the instructor.
In addition, we will survey every important technology required on the job including GitHub, Kaggle, the basics of cloud, web development and docker. With over 200 videos, readings, and assignments, you can be sure you will be well prepared to join the data community.
If you are just getting started or want to fill in some of your knowledge gaps this course is for you!