Teach on Udemy

Turn what you know into an opportunity and reach millions around the world.

Learn More

Your cart is empty.

Keep shopping

Data Manipulation in Python: Master Python, Numpy & Pandas

Name: Data Manipulation in Python: Master Python, Numpy & Pandas
Rating: 4.2 (2722 reviews)

Learn Python, NumPy & Pandas for Data Science: Master essential data manipulation for data science in python

Created byMeta Brains

Last updated 1/2024

English

What you'll learn

Learn to use Pandas for Data Analysis
Learn to work with numerical data in Python
Learn statistics and math with Python
Learn how to code in Jupyter Notebook
Learn how to install packages in Python

Course content

10 sections • 108 lectures • 3h 46m total length

Welcome to the course!0:36
Learn to use Pandas, NumPy, and plotting libraries like Seaborn to clean, visualize, and analyze data, and set you on the path to becoming a world-class data scientist.
Introduction to Python0:52
Set up essential data science tools and review Python basics, including English-like syntax, built-in libraries, and the interpreted execution that powers software development, data science, intelligence, and machine learning.
Course Materials0:05
Setting up Python2:24
Install Python 3 from python.org on Windows or Mac, add Python 3.9 to path, and use Python IDLE to print hello world and explore syntax highlighting and auto completion.
What is Jupyter?0:59
Explore Jupyter notebook, a web-based, cell-based environment to write and run code, create visualizations, and add narrative text, installed via the anaconda distribution that bundles Python, pandas, and numpy.
Anaconda Installation: Windows, Mac & Ubuntu4:15
Install Anaconda on Windows, Mac, and Ubuntu using the graphical installer, then launch Anaconda Navigator to start Jupyter Notebook and create a Python 3 notebook.
How to implement Python in Jupyter?0:44
Develop an understanding of the Jupyter notebook interface, using its cell-based structure to write Python code, print hello world, run cells, and view outputs beneath each cell.
Managing Directories in Jupyter Notebook2:48
Learn to manage directories for Jupyter notebooks across Windows and Mac, using Anaconda Prompt or terminal, change drives, cd into paths, and launch notebooks like hello.ipynb.
Input/Output1:44
Explore Python input and output by using the print function to display messages, capture user input with input, assign it to a variable, and display a welcome message.
Quiz 1
Working with different datatypes1:06
Explore Python's primitive data types—integers, floats, strings, and booleans—and built-in structures such as lists, dictionaries, and tuples. Use the type function to identify data types and print results.
Variables1:50
Explore dynamic typing and runtime type inference in Python by creating variables, printing values without quotes, and adhering to valid naming rules.
Quiz 2
Quiz 3
Arithmetic Operators1:48
Explore Python arithmetic operators, including plus for addition, minus for subtraction, and asterix for multiplication, division with a slash, double slash for integer division, and the modulo operator for remainders.
Quiz 4
Quiz 5
Quiz 6
Comparison Operators0:43
Master Python comparison operators to evaluate expressions as true or false. Learn greater than, less than, greater than or equal to, less than or equal to, equal, and not equal to.
Logical Operators3:05
Explore how to use the three logical operators in Python, and, or, and not, to combine conditions, with hands-on demos showing true and false results.
Quiz 7
Quiz 8
Quiz 9
Conditional statements2:20
Master Python conditional statements, including if, elif, and else, with top-down evaluation and blocks, illustrated through the Jupyter Notebook examples and using the and operator.
Loops4:30
Explore Python loops: use for range(start, end, step) with end exclusive to print evens 0 to 20, and while with a condition and break for infinite loops.
Sequences: Lists3:18
Explore Python sequences, focusing on lists, dictionaries, and tables, including indexing, slicing, and iterating with for loops, and using len to count elements.
Sequences: Dictionaries2:48
Learn how dictionaries store key value pairs in Python, access values with square brackets, and iterate over keys and values with loops using sample data like name, age, and country.
Sequences: Tuples1:07
Understand how tuples store multiple values like lists, yet remain immutable after creation. Learn to index, slice, and iterate tuples in Python with examples similar to lists.
Quiz 10
Quiz 11
Quiz 12
Functions: Built-in Functions0:26
Explore Python's built-in functions, including len, ab, and mux, with a link to a detailed list, and prepare for the next lecture.
Functions: User-defined Functions3:14
Learn to define and call user-defined functions in Python with def, pass parameters, return values, and test them in a Jupyter notebook, ensuring functions are defined before calling.
Quiz 13
Quiz 14

Installing Libraries0:36
Explore the Python libraries used in this course and note that installing Anaconda provides all essential data science libraries, with separate modules discussed later.
Importing Libraries1:47
Learn to link libraries to problems using the import keyword, shorten calls with alias as, and import pandas, numpy, matplotlib.pyplot, and seaborn in a notebook.
Pandas Library for Data Science0:48
Explore the pandas library as the foundation of data science, learning how it stores, accesses, cleans, and processes data, and examine series and a data frame as core data structures.
NumPy Library for Data Science0:51
Explore NumPy’s multi-dimensional arrays and its high-level functions, and see how pandas data frames relate as 2D arrays built on NumPy, borrowing many of its functions.
Pandas vs NumPy0:33
Discover how pandas handles data organization and processing, while numpy powers mathematical computations; pandas is built on top of numpy, letting you access many Python functions with less code.
Matplotlib Library for Data Science0:37
Master data visualization with the matplotlib library, plotting line graphs, bar plots, scatter plots, and pie charts, using live plotting that mirrors MATLAB's plotting capabilities.
Seaborn Library for Data Science0:20
Seabourn is another powerful data visualization library built on top of mat plot wlib and extends its functionality by providing additional types of graphs with fewer syntax.

Introduction to NumPy arrays0:45
Explore NumPy arrays as multi-dimensional grids with built-in functionality, created from Python lists, and compare their features and efficiency to Python lists in memory use and speed.
Creating NumPy arrays6:13
Create numpy arrays from Python lists or tuples, building 1D, 2D, and 3D arrays with numpy. Use the ndim attribute to confirm each array's number of dimensions.
Quiz 15
Indexing NumPy arrays5:45
Explore indexing in numpy arrays using square brackets for 2d and 3d structures; practice extracting elements by providing nested indices to access specific rows, columns, and depth.
Quiz 16
Array shape0:35
Learn how the shape attribute exposes a NumPy array's dimensions and size per dimension, with an example where the first value is dimensions and the second is elements per dimension.
Iterating Over NumPy Arrays4:57
Iterate over NumPy arrays with for loops, from 1D to 3D, using nested loops to access each element and print rows, matrices, and 3D arrays in a Jupyter Notebook.

Basic NumPy arrays: zeros()1:33
Learn how to create numpy arrays from Python lists and use numpy zeros to generate an array of ten zeros. Print the result and convert the array to integers.
Basic NumPy arrays: ones()1:09
Create a NumPy array of ten ones with NumPy's ones function, print the result, and cast it to integers with astype.
Basic NumPy arrays: full()1:16
Create an array with numpy.full by specifying the element count and a fill value, using two arguments to produce 10 elements all equal to 5, with a float for floats.
Quiz 17
Adding a scalar1:41
Learn to add a scalar to a numpy array with the plus operator, where every element increments, and compare with the error when applying the same operation to Python lists.
Subtracting a scalar1:04
Subtract a scalar from a NumPy array using the minus operator to update each element, while Python lists raise an error; the lesson demonstrates subtracting two from an array.
Multiplying by a scalar1:20
Multiply a NumPy array by a scalar using the * operator so each element is doubled, while a Python list multiplied by a scalar is concatenated with itself.
Dividing by a scalar1:25
Divide a numpy array by a scalar with / for floats and // for integers, while Python lists raise an error.
Raise to a power0:48
Explore raising each array element to a power with the double star operator, showing that numpy arrays support elementwise exponentiation while Python lists raise an error.
Transpose0:48
Learn how to transpose a numpy array using its transpose operator, and note that Python lists have no built-in transpose function, as shown by matrix examples.
Element wise addition1:59
Practice element wise addition of two numpy arrays with the plus operator, producing corresponding element sums. Contrast this with Python lists, where plus concatenates rather than adds element wise.
Element wise subtraction0:56
Perform element-wise subtraction using the minus operator on arrays to obtain the difference between corresponding elements, while noting that Python lists cannot be subtracted with minus, producing an error.
Element wise multiplication0:58
Demonstrate element-wise multiplication with the star operator on matrices, producing a product for each corresponding element. Show that using the star operator on two Python lists yields an error.
Element wise division1:04
Practice element wise division in Python, using / for real division and // for floor division, and learn why dividing Python lists with these operators raises errors.
Matrix multiplication1:34
Multiply two arrays using the matmul function to obtain their matrix product, demonstrated in a Jupyter notebook, and learn why using the print function helps when multiple expressions appear.
Quiz 18
Statistics2:54
Explore essential NumPy statistics by computing min, max, sum, mean, standard deviation, and median on a matrix in a Jupyter notebook, highlighting practical data manipulation in Python.

What is a Python Pandas DataFrame?0:57
Identify the two main pandas data structures—the two-dimensional DataFrame and the one-dimensional Series—and learn how DataFrame rows use an index and columns use labels that can be numbers or strings.
What is a Python Pandas Series?0:42
A Pandas series is a one-dimensional data structure with a single column. It shows an index from zero to nine, with the series name and data type of its values.
DataFrame vs Series0:28
Explore how a data frame is a collection of series, with each column representing a series, and learn methods to obtain a series from a data frame.
Creating a DataFrame using lists3:17
Create a DataFrame from a two-dimensional list with pandas, which auto assigns indices and labels; supply column names with the columns parameter, and create from arrays as well.
Creating a DataFrame using a dictionary1:06
Create a pandas data frame from a dictionary by using each key as a column label and lists of values; the resulting frame has column labels equal to dictionary keys.
Loading CSV data into python1:52
Load a csv file as a data frame using pandas in a Jupyter notebook, then print the data frame to view its five columns—name, calories, protein, vitamins, and rating.
Changing the Index Column1:06
Learn to set a data frame's index to a string column using pandas' set_index, turning the name column into the index and validating the result.
Inplace1:20
Discover how many pandas functions do not change a data frame in place. Use in place to permanently modify it, as shown when setting the index.
Examining the DataFrame: Head & Tail0:36
Inspect data frames or series in python using head and tail to view the first or last rows, with optional counts.
Statistical summary of the DataFrame0:37
Apply the describe function to generate a statistical summary for the DMF data frame, presenting column-wise statistics in Python with NumPy and Pandas.
Slicing rows using bracket operators1:26
Learn to slice rows in a data frame with the bracket operator, using inclusive start and exclusive end indices, based on row positions, not labels.
Quiz 19
Indexing columns using bracket operators0:51
Use the bracket operator to index one or more columns by their labels. Index the first and last columns by labels as lists, leaving the original data frame unchanged.
Boolean list1:15
Apply a boolean list to a data frame's brackets operator to filter rows; set the third element to true while others are false, showing only the third row.
Filtering Rows1:22
Filter rows in a data frame by applying a condition on the calories column greater than 70, using the bracket operator and a boolean list to select only matching rows.
Filtering rows using & and | operators1:51
Filter rows in a data frame by combining conditions with the and operator and the pipe sign in pandas, using parentheses to group calories > 70 and protein < 4.
Filtering data using loc()3:35
Learn to filter pandas data frames with loc and iloc by indexing rows and slicing columns, or vice versa, using label-based or position-based selection.
Quiz 20
Filtering data using iloc()2:23
Use iloc to index and slice a data frame by integer positions, not labels. Master zero-based indexing and select first five rows and first three columns with iloc.
Quiz 21
Quiz 22
Adding and deleting rows and columns2:41
Add and delete rows and columns in a data frame using the lock method with labels and values, and drop them with the drop function using axis 0 or 1.
Sorting Values1:39
Sort a data frame by a column using sort_values; sort ascending by default, numerically for numbers and alphabetically for words, and set ascending to false for descending by calories.
Exporting and saving pandas DataFrames1:30
Explore exporting a pandas data frame to a csv file using the to_csv function, including overwriting existing files, setting index=False, and verifying the saved data.
Concatenating DataFrames0:59
Learn to concatenate data frames in pandas by stacking vertically or joining side by side using axis=1, working with data frames that share the same row labels.
groupby()2:39
Group a data frame by gender, apply the mean and other aggregates to non-grouped columns, and concatenate the results to create a summary using groupby in pandas.

Introduction to Data Cleaning0:37
Real-world data from multiple sources often arrives inconsistent and with missing values; this lecture teaches basic techniques to clean such data and prepare it for practical use and manipulation.
Quality of Data0:47
Ensure decision quality depends on data quality by assessing and cleaning data. Apply data cleaning techniques to remove inaccuracies and improve data fit for operations, decision making, and planning.
Examples of Anomalies1:04
Identify anomalies, or outliers, in data and understand how a single anomalous value can indicate data issues or instrument defects, and explore techniques to detect them for accurate results.
Median-based Anomaly Detection2:41
Apply median-based anomaly detection by computing absolute differences from the median and comparing to a threshold to identify outliers in a Pandas series, illustrated with 4.5 as an anomaly.
Mean-based anomaly detection2:50
Detect anomalies by using mean and standard deviation bounds—lower bound as mean minus std, upper bound as mean plus std, or two times std—demonstrated with pandas series in Jupyter Notebook.
Z-score-based Anomaly Detection2:50
Detect anomalies in data using z-score, calculating mean and standard deviation, and flag values beyond 1.5 standard deviations as outliers. Use pandas and numpy to implement this technique.
Interquartile Range for Anomaly Detection4:33
Detect anomalies with interquartile range by computing Q1, Q2 median, and Q3 via numpy percentile brackets, and flag values outside Q1 minus 1.5 times IQR or Q3 1.5 times IQR.
Dealing with missing values6:01
Identify missing values with pandas isnull and sum to see which columns contain them. Drop rows with missing data or fill with mean, median, or mode using fillna.
Regular Expressions6:57
Master regular expressions in Python to match patterns, clean data, extract digits, and replace text using findall, search, and sub on strings and data series.
Feature Scaling3:17
Learn how to scale features in Pandas using min-max normalization to 0 to 1 and standardization by centering on the mean and dividing by the standard deviation.

Introduction0:29
Learn to create visuals in Jupyter notebook with the mock plot lib library, exploring various graph types and fundamentals of data visualization in Python.
Setting Up Matplotlib0:33
Learn how to set up matplotlib to create graphs in Python, using its MATLAB-like interface and the import syntax import matplotlib.pyplot as plt.
Plotting Line Plots using Matplotlib1:45
Plot line plots in Matplotlib by supplying x and y lists in a Jupyter notebook, and customize color using a third argument, with color abbreviations documented in Matplotlib.
Title, Labels & Legend6:46
Set graph titles with the title function and add x and y labels, then plot multiple series on one chart and use a legend.
Plotting Histograms1:22
Plot histograms in Python from a list of values, displaying frequency on the y-axis, and customize color and bar width.
Plotting Bar Charts2:04
Learn to plot bar charts in Python by using the dot bar function, compare bar charts with histograms, and adjust x values, heights, and bar width in a Jupyter Notebook.
Plotting Pie Charts2:49
Learn to create pie charts using the dot pie function, pass values and labels, adjust wedge distance with explored, and customize colors to distinguish wedges.
Plotting Scatter Plots5:43
Learn to plot scatter plots in Python using the scatter function with x and y data, and customize colors with color, c, and cmap options.
Plotting Log Plots0:41
Switch the y axis to a logarithmic scale to handle data with exponential growth, comparing linear and log plots to reveal points that are far apart.
Plotting Polar Plots2:06
Learn to create a polar plot using a mock plotting library by plotting radius r=2 over theta from 0 to 2 pi with 0.01 steps.
Handling Dates0:43
Rotate the x axis tick labels by 90 degrees to improve readability when many or long date values crowd the axis, using the rotation parameter of the x ticks function.
Creating multiple subplots in one figure3:28
Create multiple subplots in a single figure by specifying rows, columns, and position. Place two plots side by side and apply tight layout for spacing.

Introduction0:19
Begin exploring data with exploratory data analysis techniques to analyze the data for a better understanding.
What is Exploratory Data Analysis?0:30
Perform exploratory data analysis in Python to investigate datasets, identify patterns and anomalies, summarize the main characteristics, and visualize results with graphs.
Univariate Analysis1:41
Univariate analysis examines data based on a single variable, offering a concise overview and identifying patterns for a data frame with four feature columns and one output.
Univariate Analysis: Continuous Data6:00
Explore univariate continuous data with scatter, strip, distribution plots, histograms, and box plots in Python using Seaborn; visualize petal length by variety and the five-number summary.
Univariate Analysis: Categorical Data2:16
Perform univariate analysis on the Verity column using the seaborn dot count plot to show category counts, then visualize proportions with a pie chart from value_counts.
Bivariate analysis: Continuous & Continuous4:32
Perform bivariate analysis of two continuous features using scatterplots and pandas corr. Use heatmaps to visualize correlations on the Titanic dataset, noting low correlation between fare and edge.
Bivariate analysis: Categorical & Categorical3:07
Follow a two-step solution to bivariate analysis of categorical p class and survived with a bar plot, by grouping on p class and summing survival to show class-wise rates.
Bivariate analysis: Continuous & Categorical1:51
Demonstrate bivariate analysis on continuous and categorical variables using the Titanic data frame with box plots and bar plots, highlighting survival by age and sex.
Detecting Outliers5:34
Identify outliers and learn practical techniques to handle them, including z-score based trimming and median imputation, demonstrated on a sample series in Python using pandas.
Categorical Variable Transformation4:22
Learn categorical variable transformation to convert gender data into numerical form using labeling coding and frequency encoding. Replace male and female with numbers in pandas and encode categories by frequencies.

Introduction to Time Series2:15
Explore time series data, analyze patterns, and develop forecasting models using pandas, numpy, and the Python date time module, applying date time operations and stock data examples.
Getting stock data using yfinance3:14
Learn to download stock data from Yahoo Finance using yfinance in a Jupyter notebook. Import the library and fetch Apple stock data for January 2022 as a Pandas time series.
Converting a Dataset into Time Series4:23
Convert a dataset into a time series by turning string date values into timestamps with pandas to_datetime, including handling custom formats via the format parameter.
Working with Time Series3:52
Master time series in Python with pandas to check days of week, identify business days, and index, slice, and partially match timestamps for February data.
Time Series Data Visualization with Python3:03
Visualize time series data in Python using matplotlib to plot Apple stock prices over five years, and specifically the year 2021, demonstrating how to create figures and label axes.

Requirements

No prior data science knowledge required
No programming experience needed

Description

When it comes to being attractive, data scientists are already there. In a highly competitive job market, it is tough to keep them after they have been hired. People with a unique mix of scientific training, computer expertise, and analytical abilities are hard to find.

Like the Wall Street "quants" of the 1980s and 1990s, modern-day data scientists are expected to have a similar skill set. People with a background in physics and mathematics flocked to investment banks and hedge funds in those days because they could come up with novel algorithms and data methods.

That being said, data science is becoming one of the most well-suited occupations for success in the twenty-first century. It is computerized, programming-driven, and analytical in nature. Consequently, it comes as no surprise that the need for data scientists has been increasing in the employment market over the last several years.

The supply, on the other hand, has been quite restricted. It is challenging to get the knowledge and abilities required to be recruited as a data scientist.

Lots of resources for learning Python are available online. Because of this, students frequently get overwhelmed by Python's high learning curve.

It's a whole new ball game in here! Step-by-step instruction is the hallmark of this course. Throughout each subsequent lesson, we continue to build on what we've previously learned. Our goal is to equip you with all the tools and skills you need to master Python, Numpy & Pandas.

You'll walk away from each video with a fresh idea that you can put to use right away!

All skill levels are welcome in this course, and even if you have no prior programming or statistical experience, you will be able to succeed!

Who this course is for:

No previous skills or expertise required. Only a drive to succeed!

Data Manipulation in Python: Master Python, Numpy & Pandas

What you'll learn

Explore related topics

Course content

Python Quick Refresher (Optional)21 lectures • 41min

Essential Python Libraries for Data Science7 lectures • 6min

Fundamental NumPy Properties5 lectures • 18min

Mathematics for Data Science15 lectures • 20min

Python Pandas DataFrames & Series22 lectures • 34min

Data Cleaning10 lectures • 32min

Data Visualization using Python12 lectures • 28min

Exploratory Data Analysis10 lectures • 30min

Time Series in Python5 lectures • 17min

BONUS Section - Don't Miss Out1 lecture • 1min

Requirements

Description

Who this course is for: