
Learn to use Pandas, NumPy, and plotting libraries like Seaborn to clean, visualize, and analyze data, and set you on the path to becoming a world-class data scientist.
Set up essential data science tools and review Python basics, including English-like syntax, built-in libraries, and the interpreted execution that powers software development, data science, intelligence, and machine learning.
Install Python 3 from python.org on Windows or Mac, add Python 3.9 to path, and use Python IDLE to print hello world and explore syntax highlighting and auto completion.
Explore Jupyter notebook, a web-based, cell-based environment to write and run code, create visualizations, and add narrative text, installed via the anaconda distribution that bundles Python, pandas, and numpy.
Install Anaconda on Windows, Mac, and Ubuntu using the graphical installer, then launch Anaconda Navigator to start Jupyter Notebook and create a Python 3 notebook.
Develop an understanding of the Jupyter notebook interface, using its cell-based structure to write Python code, print hello world, run cells, and view outputs beneath each cell.
Learn to manage directories for Jupyter notebooks across Windows and Mac, using Anaconda Prompt or terminal, change drives, cd into paths, and launch notebooks like hello.ipynb.
Explore Python input and output by using the print function to display messages, capture user input with input, assign it to a variable, and display a welcome message.
Explore Python's primitive data types—integers, floats, strings, and booleans—and built-in structures such as lists, dictionaries, and tuples. Use the type function to identify data types and print results.
Explore dynamic typing and runtime type inference in Python by creating variables, printing values without quotes, and adhering to valid naming rules.
Explore Python arithmetic operators, including plus for addition, minus for subtraction, and asterix for multiplication, division with a slash, double slash for integer division, and the modulo operator for remainders.
Master Python comparison operators to evaluate expressions as true or false. Learn greater than, less than, greater than or equal to, less than or equal to, equal, and not equal to.
Explore how to use the three logical operators in Python, and, or, and not, to combine conditions, with hands-on demos showing true and false results.
Master Python conditional statements, including if, elif, and else, with top-down evaluation and blocks, illustrated through the Jupyter Notebook examples and using the and operator.
Explore Python loops: use for range(start, end, step) with end exclusive to print evens 0 to 20, and while with a condition and break for infinite loops.
Explore Python sequences, focusing on lists, dictionaries, and tables, including indexing, slicing, and iterating with for loops, and using len to count elements.
Learn how dictionaries store key value pairs in Python, access values with square brackets, and iterate over keys and values with loops using sample data like name, age, and country.
Understand how tuples store multiple values like lists, yet remain immutable after creation. Learn to index, slice, and iterate tuples in Python with examples similar to lists.
Explore Python's built-in functions, including len, ab, and mux, with a link to a detailed list, and prepare for the next lecture.
Learn to define and call user-defined functions in Python with def, pass parameters, return values, and test them in a Jupyter notebook, ensuring functions are defined before calling.
Explore the Python libraries used in this course and note that installing Anaconda provides all essential data science libraries, with separate modules discussed later.
Learn to link libraries to problems using the import keyword, shorten calls with alias as, and import pandas, numpy, matplotlib.pyplot, and seaborn in a notebook.
Explore the pandas library as the foundation of data science, learning how it stores, accesses, cleans, and processes data, and examine series and a data frame as core data structures.
Explore NumPy’s multi-dimensional arrays and its high-level functions, and see how pandas data frames relate as 2D arrays built on NumPy, borrowing many of its functions.
Discover how pandas handles data organization and processing, while numpy powers mathematical computations; pandas is built on top of numpy, letting you access many Python functions with less code.
Master data visualization with the matplotlib library, plotting line graphs, bar plots, scatter plots, and pie charts, using live plotting that mirrors MATLAB's plotting capabilities.
Seabourn is another powerful data visualization library built on top of mat plot wlib and extends its functionality by providing additional types of graphs with fewer syntax.
Explore NumPy arrays as multi-dimensional grids with built-in functionality, created from Python lists, and compare their features and efficiency to Python lists in memory use and speed.
Create numpy arrays from Python lists or tuples, building 1D, 2D, and 3D arrays with numpy. Use the ndim attribute to confirm each array's number of dimensions.
Explore indexing in numpy arrays using square brackets for 2d and 3d structures; practice extracting elements by providing nested indices to access specific rows, columns, and depth.
Learn how the shape attribute exposes a NumPy array's dimensions and size per dimension, with an example where the first value is dimensions and the second is elements per dimension.
Iterate over NumPy arrays with for loops, from 1D to 3D, using nested loops to access each element and print rows, matrices, and 3D arrays in a Jupyter Notebook.
Learn how to create numpy arrays from Python lists and use numpy zeros to generate an array of ten zeros. Print the result and convert the array to integers.
Create a NumPy array of ten ones with NumPy's ones function, print the result, and cast it to integers with astype.
Create an array with numpy.full by specifying the element count and a fill value, using two arguments to produce 10 elements all equal to 5, with a float for floats.
Learn to add a scalar to a numpy array with the plus operator, where every element increments, and compare with the error when applying the same operation to Python lists.
Subtract a scalar from a NumPy array using the minus operator to update each element, while Python lists raise an error; the lesson demonstrates subtracting two from an array.
Multiply a NumPy array by a scalar using the * operator so each element is doubled, while a Python list multiplied by a scalar is concatenated with itself.
Divide a numpy array by a scalar with / for floats and // for integers, while Python lists raise an error.
Explore raising each array element to a power with the double star operator, showing that numpy arrays support elementwise exponentiation while Python lists raise an error.
Learn how to transpose a numpy array using its transpose operator, and note that Python lists have no built-in transpose function, as shown by matrix examples.
Practice element wise addition of two numpy arrays with the plus operator, producing corresponding element sums. Contrast this with Python lists, where plus concatenates rather than adds element wise.
Perform element-wise subtraction using the minus operator on arrays to obtain the difference between corresponding elements, while noting that Python lists cannot be subtracted with minus, producing an error.
Demonstrate element-wise multiplication with the star operator on matrices, producing a product for each corresponding element. Show that using the star operator on two Python lists yields an error.
Practice element wise division in Python, using / for real division and // for floor division, and learn why dividing Python lists with these operators raises errors.
Multiply two arrays using the matmul function to obtain their matrix product, demonstrated in a Jupyter notebook, and learn why using the print function helps when multiple expressions appear.
Explore essential NumPy statistics by computing min, max, sum, mean, standard deviation, and median on a matrix in a Jupyter notebook, highlighting practical data manipulation in Python.
Identify the two main pandas data structures—the two-dimensional DataFrame and the one-dimensional Series—and learn how DataFrame rows use an index and columns use labels that can be numbers or strings.
A Pandas series is a one-dimensional data structure with a single column. It shows an index from zero to nine, with the series name and data type of its values.
Explore how a data frame is a collection of series, with each column representing a series, and learn methods to obtain a series from a data frame.
Create a DataFrame from a two-dimensional list with pandas, which auto assigns indices and labels; supply column names with the columns parameter, and create from arrays as well.
Create a pandas data frame from a dictionary by using each key as a column label and lists of values; the resulting frame has column labels equal to dictionary keys.
Load a csv file as a data frame using pandas in a Jupyter notebook, then print the data frame to view its five columns—name, calories, protein, vitamins, and rating.
Learn to set a data frame's index to a string column using pandas' set_index, turning the name column into the index and validating the result.
Discover how many pandas functions do not change a data frame in place. Use in place to permanently modify it, as shown when setting the index.
Inspect data frames or series in python using head and tail to view the first or last rows, with optional counts.
Apply the describe function to generate a statistical summary for the DMF data frame, presenting column-wise statistics in Python with NumPy and Pandas.
Learn to slice rows in a data frame with the bracket operator, using inclusive start and exclusive end indices, based on row positions, not labels.
Use the bracket operator to index one or more columns by their labels. Index the first and last columns by labels as lists, leaving the original data frame unchanged.
Apply a boolean list to a data frame's brackets operator to filter rows; set the third element to true while others are false, showing only the third row.
Filter rows in a data frame by applying a condition on the calories column greater than 70, using the bracket operator and a boolean list to select only matching rows.
Filter rows in a data frame by combining conditions with the and operator and the pipe sign in pandas, using parentheses to group calories > 70 and protein < 4.
Learn to filter pandas data frames with loc and iloc by indexing rows and slicing columns, or vice versa, using label-based or position-based selection.
Use iloc to index and slice a data frame by integer positions, not labels. Master zero-based indexing and select first five rows and first three columns with iloc.
Add and delete rows and columns in a data frame using the lock method with labels and values, and drop them with the drop function using axis 0 or 1.
Sort a data frame by a column using sort_values; sort ascending by default, numerically for numbers and alphabetically for words, and set ascending to false for descending by calories.
Explore exporting a pandas data frame to a csv file using the to_csv function, including overwriting existing files, setting index=False, and verifying the saved data.
Learn to concatenate data frames in pandas by stacking vertically or joining side by side using axis=1, working with data frames that share the same row labels.
Group a data frame by gender, apply the mean and other aggregates to non-grouped columns, and concatenate the results to create a summary using groupby in pandas.
Real-world data from multiple sources often arrives inconsistent and with missing values; this lecture teaches basic techniques to clean such data and prepare it for practical use and manipulation.
Ensure decision quality depends on data quality by assessing and cleaning data. Apply data cleaning techniques to remove inaccuracies and improve data fit for operations, decision making, and planning.
Identify anomalies, or outliers, in data and understand how a single anomalous value can indicate data issues or instrument defects, and explore techniques to detect them for accurate results.
Apply median-based anomaly detection by computing absolute differences from the median and comparing to a threshold to identify outliers in a Pandas series, illustrated with 4.5 as an anomaly.
Detect anomalies by using mean and standard deviation bounds—lower bound as mean minus std, upper bound as mean plus std, or two times std—demonstrated with pandas series in Jupyter Notebook.
Detect anomalies in data using z-score, calculating mean and standard deviation, and flag values beyond 1.5 standard deviations as outliers. Use pandas and numpy to implement this technique.
Detect anomalies with interquartile range by computing Q1, Q2 median, and Q3 via numpy percentile brackets, and flag values outside Q1 minus 1.5 times IQR or Q3 1.5 times IQR.
Identify missing values with pandas isnull and sum to see which columns contain them. Drop rows with missing data or fill with mean, median, or mode using fillna.
Master regular expressions in Python to match patterns, clean data, extract digits, and replace text using findall, search, and sub on strings and data series.
Learn how to scale features in Pandas using min-max normalization to 0 to 1 and standardization by centering on the mean and dividing by the standard deviation.
Learn to create visuals in Jupyter notebook with the mock plot lib library, exploring various graph types and fundamentals of data visualization in Python.
Learn how to set up matplotlib to create graphs in Python, using its MATLAB-like interface and the import syntax import matplotlib.pyplot as plt.
Plot line plots in Matplotlib by supplying x and y lists in a Jupyter notebook, and customize color using a third argument, with color abbreviations documented in Matplotlib.
Set graph titles with the title function and add x and y labels, then plot multiple series on one chart and use a legend.
Plot histograms in Python from a list of values, displaying frequency on the y-axis, and customize color and bar width.
Learn to plot bar charts in Python by using the dot bar function, compare bar charts with histograms, and adjust x values, heights, and bar width in a Jupyter Notebook.
Learn to create pie charts using the dot pie function, pass values and labels, adjust wedge distance with explored, and customize colors to distinguish wedges.
Learn to plot scatter plots in Python using the scatter function with x and y data, and customize colors with color, c, and cmap options.
Switch the y axis to a logarithmic scale to handle data with exponential growth, comparing linear and log plots to reveal points that are far apart.
Learn to create a polar plot using a mock plotting library by plotting radius r=2 over theta from 0 to 2 pi with 0.01 steps.
Rotate the x axis tick labels by 90 degrees to improve readability when many or long date values crowd the axis, using the rotation parameter of the x ticks function.
Create multiple subplots in a single figure by specifying rows, columns, and position. Place two plots side by side and apply tight layout for spacing.
Begin exploring data with exploratory data analysis techniques to analyze the data for a better understanding.
Perform exploratory data analysis in Python to investigate datasets, identify patterns and anomalies, summarize the main characteristics, and visualize results with graphs.
Univariate analysis examines data based on a single variable, offering a concise overview and identifying patterns for a data frame with four feature columns and one output.
Explore univariate continuous data with scatter, strip, distribution plots, histograms, and box plots in Python using Seaborn; visualize petal length by variety and the five-number summary.
Perform univariate analysis on the Verity column using the seaborn dot count plot to show category counts, then visualize proportions with a pie chart from value_counts.
Perform bivariate analysis of two continuous features using scatterplots and pandas corr. Use heatmaps to visualize correlations on the Titanic dataset, noting low correlation between fare and edge.
Follow a two-step solution to bivariate analysis of categorical p class and survived with a bar plot, by grouping on p class and summing survival to show class-wise rates.
Demonstrate bivariate analysis on continuous and categorical variables using the Titanic data frame with box plots and bar plots, highlighting survival by age and sex.
Identify outliers and learn practical techniques to handle them, including z-score based trimming and median imputation, demonstrated on a sample series in Python using pandas.
Learn categorical variable transformation to convert gender data into numerical form using labeling coding and frequency encoding. Replace male and female with numbers in pandas and encode categories by frequencies.
Explore time series data, analyze patterns, and develop forecasting models using pandas, numpy, and the Python date time module, applying date time operations and stock data examples.
Learn to download stock data from Yahoo Finance using yfinance in a Jupyter notebook. Import the library and fetch Apple stock data for January 2022 as a Pandas time series.
Convert a dataset into a time series by turning string date values into timestamps with pandas to_datetime, including handling custom formats via the format parameter.
Master time series in Python with pandas to check days of week, identify business days, and index, slice, and partially match timestamps for February data.
Visualize time series data in Python using matplotlib to plot Apple stock prices over five years, and specifically the year 2021, demonstrating how to create figures and label axes.
When it comes to being attractive, data scientists are already there. In a highly competitive job market, it is tough to keep them after they have been hired. People with a unique mix of scientific training, computer expertise, and analytical abilities are hard to find.
Like the Wall Street "quants" of the 1980s and 1990s, modern-day data scientists are expected to have a similar skill set. People with a background in physics and mathematics flocked to investment banks and hedge funds in those days because they could come up with novel algorithms and data methods.
That being said, data science is becoming one of the most well-suited occupations for success in the twenty-first century. It is computerized, programming-driven, and analytical in nature. Consequently, it comes as no surprise that the need for data scientists has been increasing in the employment market over the last several years.
The supply, on the other hand, has been quite restricted. It is challenging to get the knowledge and abilities required to be recruited as a data scientist.
Lots of resources for learning Python are available online. Because of this, students frequently get overwhelmed by Python's high learning curve.
It's a whole new ball game in here! Step-by-step instruction is the hallmark of this course. Throughout each subsequent lesson, we continue to build on what we've previously learned. Our goal is to equip you with all the tools and skills you need to master Python, Numpy & Pandas.
You'll walk away from each video with a fresh idea that you can put to use right away!
All skill levels are welcome in this course, and even if you have no prior programming or statistical experience, you will be able to succeed!