
This section provides an overview of the different topics covered in this chapter
Here we cover the key components and features that one should expect from an IDE. This will help you evaluate the different IDE options available and choose the one you like the most
This section gives an overview of the SPYDER IDE which is used for all code demonstrations throughout this course.
Here, we look at its different features which come handy while working on live projects:
Highlighting of different code elements - Standard Functions or Keywords, versus User defined variables
Syntax analysis - highlighting of incorrect syntax to prompt a user to take corrective action
Option to execute complete or portion of codes
Variable explorer pane - for easier access/view of the created objects (DataFrames, variables, etc.)
Easier navigation across code lines - something which is much needed while working on large codes
Easier text search and replace option - using the text editor pane
Here we cover some of the Python basics to help you get started with writing codes
This section provides an overview of the different topics covered in this chapter
Learn:
How to read in Python, the Input data from files of CSV, Text and Excel formats
About the different attributes that a DataFrame has
More about Python functions (analogous to the read_csv() function)
Learn how to read :
An Excel file with extra blank rows and columns (besides the data that needs to be read)
A text file without header (variable names) in it, and how to provide custom names to different variables
This section also introduces Lists and their use in the read_csv() function
Learn how to :
Read only selected columns from a large CSV file without having to import it fully (due to memory space constraints)
Correctly read Date variables from an input file so date related computations could be done on them
Handle missing values and special characters in an input file and convert them all to NaN values
Learn how to :
Control the format or 'dtype' of variables while reading an input file:
Object to Integer
Integer to Object
Object to Date
This section also introduces Dictionary and its use in controlling variable formats
Learn more about Lists and Dictionaries, i.e.:
How to create them from scratch
How to reference their elements using Index (in case of Lists) and Keys (in case of Dictionaries)
Introduction gives an overview of the different topics covered in this chapter
Learn how to quickly view information of a DataFrame, i.e. :
Variables and their types (format)
# Rows
# Non-missing values for each variable, etc.
Learn how to fetch variable names from a DataFrame and store them in a List for further use
Learn how to fetch variable types (format) from a DataFrame and store them in a Dictionary for further use
Learn about the "relevance of Index" in a DataFrame
Learn about role of Square Brackets as index operators and how they could be used for variable selection (single/multiple)
Learn how to explore a DataFrame in more detail:
View sample records
View selected variables and their sample records
# Non-missing values for each variable, etc.
Learn how to :
Check if the DataFrame has any missing values (by returning a True or False flag)
Check which variables in the DataFrame have missing values and how many (i.e. count of missing values)
Learn how to :
View the DataFrame records where a particular variable has missing values
View all DataFrame records with at least one missing value
Learn how to :
View duplicate records in a DataFrame, and know some additional controls around:
Viewing all duplicate records except the First/Last occurrence
Viewing all duplicate records
Drop duplicate records in a DataFrame
Learn how to :
Check for unexpected values in object variables of a DataFrame :
Write a For loop and use it to perform repetitive operations on DataFrame variables
Learn how to :
Compute quick totals/averages on Numeric variables in a DataFrame
View quick summary statistics on all the Numeric variables and how to get custom summaries
View Frequency Distribution (to check for variable skewness) on
Object variables
Numeric variables - using Python's Matplotlib Library
Introduction gives an overview of the different topics covered in this chapter
In this section, We:
Look at some practical examples of "Why one needs to create subsets from Input data"?
Learn about the limitations of Square Brackets as Index Operators, and what kind of Row & Column selections can they NOT do, by themselves
Learn what other Indexing Operators could be used and what kind of additional controls they provide, when it comes to selecting Rows and Columns
Learn how to create a DataFrame subset using the iloc Indexer, by selecting:
Rows - based on integer Indexes
Columns - based on integer Indexes
Both Rows/ Columns - based on their Integer Indexes
Learn through examples:
How to set a DataFrame variable as DataFrame Index
How to create a DataFrame subset using the loc Indexer, by selecting:
Rows - based on Index labels
Columns - based on variable names
Learn how to perform the below operations by using the iloc and loc Indexers
Select limited columns and store in a new DataFrame
Compute Sum (Totals) or Mean on multiple columns (across columns)
Compute difference between these Totals (Example: Previous Qtr versus Prior Qtr)
Learn more about Series as a Data Structure, i.e.:
Different components of a Series
How to create a Series from scratch
Using a Series to create a Bar Plot with Matplotlib Module
Learn :
Why some functions have parenthesis (print(), read_csv(), etc.) and some do not (d.shape, d.columns)
More about Functions, Methods and Attributes
Introduction gives an overview of the different topics covered in this chapter
Learn how to select Rows in a DataFrame by applying a Single Criteria based on:
Numeric thresholds
Single or Multiple String value(s)
Also, Learn how to verify, whether the criteria is applied correctly and only the desired values are selected in the created subset
Learn how to select Rows in a DataFrame by applying Criteria on Multiple variables, with an:
AND condition
OR condition
Also, Learn how to use Series to first apply this criteria separately, and then use them in a DataFrame
Learn how to select Rows in a DataFrame by applying Criteria on Multiple variables, with:
Both AND and OR conditions
Also, Learn how to un-complicate the code by using Series to apply these criteria separately, and then use them in a DataFrame
Learn how to use the Drop() function to:
Drop Rows/ Columns in a DataFrame Based on their Integer/ Label Index values
Drop Rows in a DataFrame that meet a criteria (Single or Multiple)
Learn some additional functions to select/drop DataFrame records with missing values
Learn through some of the practical examples (2 of 6 examples), how to drop :
DataFrame Rows with all missing values
DataFrame Rows with at least one missing value
Learn through some of the practical examples (4 of 6 examples), how to drop :
DataFrame Rows with missing values in a particular variable
DataFrame Rows with missing values in multiple variables
DataFrame Columns with missing values > a certain threshold
DataFrame Columns with a common Prefix/Suffix
Introduction gives an overview of the different topics covered in this chapter
Here, we look at a few examples to understand the 'need of creating new variables' over and above the variables that we get in the Input Data
Learn how to create a new DataFrame variable with:
A user provided single Integer/Float value
A user provided Custom Date or Today's date
A single Random Float or Integer value using suitable functions
Learn how to:
Create a new DataFrame variable with multiple Random Integer values (varying across the Rows)
Convert a normal Operation into a function using Lambda Expression
Apply a function on all DataFrame rows
Learn through examples how to perform the below operations on DataFrame variables:
Common mathematical operations on Numeric variables
Date computations - Year, Month and Day extraction along with computing Date Difference
String computations - Extracting string length and splitting a string based on a specified sub-string
Learn :
How to create a YearMonth (YYYYMM) variable from a given Date variable
Where to use an apply() function (while working on a DataFrame) and where not
More about the Lambda Expressions
Learn how to create a DataFrame variable with Bins/Categories labels using:
pandas.cut function
loc indexer
Learn:
How to create a DataFrame variable with Bins/Categories labels using a custom function
How to make a custom function more dynamic to increase its re-usability
Introduction gives an overview of the different topics covered in this chapter
Here, we look at:
For what all purposes can we utilize the Frequency Distributions
Also, on the Practice Dateset (used in this chapter), we formulate the approach for answering some of the business questions by using Frequency Distributions to arrive at some of the insights.
Here, we first do a recap of different variations under the value_counts() function. Additionally, we'll learn, how to:
View missing value count for a variable in its frequency distribution
Convert the value_counts() output (a Series) to a DataFrame for any further use
Export a Frequency Distribution output to a CSV, by first renaming variable names to names of choice
Learn how to:
Observe Frequency Distributions on Two variables at a time
Export the Multiple Variable Frequency Distribution output to a CSV
Learn how to:
Observe Frequency Distributions on more than Two variables
Observe Frequency count on Multiple variables in a List view (similar to single variable frequency distributions)
Learn how to:
View Multiple Variable Frequency Distribution output on a DataFrame subset
Fix multilevel header of a Multiple Variable Frequency Distribution output
Learn through some of the practical examples (3 of 5 examples), how to:
Check for duplicate occurrences of labels in a variable
View time period of a DataFrame using one of the Date variable
Compute the date difference between Two dates and view distribution on this difference
Learn through some of the practical examples (2 of 5 examples), how to:
Compare the labels assigned by Two different binning methods and check for any difference
Observe a 2-Way Frequency Distribution output to assess the %migration of customers
Introduction gives an overview of the different topics covered in this chapter
Here We:
First look at a raw approach to understand the underlying idea behind creating data summaries at sub-group levels
Introduce the more suited groupby() function to create multiple summaries (Average, Count, Min and Max)
Also look at how to perform multiple operations on the same numeric variable using this function
Here We analyze the popular Titanic data by following a Three-pronged approach, which includes:
Formulating relevant questions
Generating meaningful summaries
Deriving insights by analyzing these summaries with the help of Matplotlib charts
Here We analyze the popular Titanic data by following a Three-pronged approach, which includes:
Formulating relevant questions
Generating meaningful summaries
Deriving insights by analyzing these summaries with the help of Matplotlib charts
Here we look at how to rename the variables in summarized output with Multi-layer Index object:
With similar operations done on multiple variables
With numbers in the one of the header layers
Here we look at how to rename the variables in summarized output with Multi-layer Index object:
With similar operations done on multiple variables
With numbers in the one of the header layers
Introduction gives an overview of the different topics covered in this chapter
Learn how to read multiple input files in one shot and explore what are the best ways to store the resulting DataFrames
Learn how to read multiple input files in one shot and explore what are the best ways to store the resulting DataFrames
Learn the overall working of List Comprehension and Generator Expressions and how to use them to read multiple input files and store the resulting DataFrames
Learn how to Stack :
Two standalone DataFrames
More than Two DataFrames by reading multiple Input files
More than Two DataFrames stored in a Dictionary
Learn how to perform common operations on the DataFrames stored in a Dictionary
Learn how to Merge :
Two standalone DataFrames
More than Two DataFrames stored in a Dictionary
Introduction gives an overview of the different topics covered in this chapter
Learn how to create a re-usable Python Function from regular code/process (Low complexity), so that it could be used on other DataFrames.
Learn
How to create a re-usable Python Function from a regular code/process (Medium complexity)
What additional controls are required to create a more complex re-usable Python Function
Learn
What additional controls does Python provide to create a more complex re-usable Python Function
How to create a Python function with a provision to pass dynamic number of Arguments
Learn how to create a Highly Dynamic and re-usable Python Function from regular code/process.
Learn how to create a Highly Dynamic and re-usable Python Function from regular code/process.
Learn how to create a Highly Dynamic and re-usable Python Function from regular code/process.
This course takes a step-by-step approach to take you through the building blocks of conducting data analysis and enabling you to use Python as a tool :
To perform data ingestion and handle Input data nuances
To perform data quality checks and conduct exploratory data analysis (EDA)
For data processing and feature creation
For KPI generation, data summarization, conducting data analysis and deriving insights
For multiple files handling and process improvements by making custom functions more dynamic