Teach on Udemy

Turn what you know into an opportunity and reach millions around the world.

Learn More

Your cart is empty.

Keep shopping

Preprocessing Data with NumPy

Name: Preprocessing Data with NumPy
Rating: 4.5 (475 reviews)

NumPy, ndarrays, Slicing, Random Generators, Importing and Saving Data, Statistics, Data Manipulation, Preprocessing

Created by365 Careers

Last updated 12/2020

English

What you'll learn

Arrays.
The definition of a package/library.
Installing and Upgrading a package.
Navigating the documentation.
A history of NumPy.
The relationship between arrays and vectors.
Arrays vs Lists.
Indexing.
Assigning values to arrays.
Elementwise properties and operations.
Datatypes supported by ndarrays.
Broadcasting and type casting.
Running a function or method over a given axis.
Slicing, Stepwise Slicing, Conditional Slicing
Dimensionality reduction in arrays.
Generating arrays full of identical values.
Generating non-random sequences of data.
Generating random data with Random Generators.
Generating random samples from a random probability distribution.
Importing and exporting data with and from NumPy.
NPY and NPZ files.
Maximums and Minimums.
Percentiles and Quantiles.
Mean and Variance.
Covariance and Correlation.
Calculating histograms.
Higher dimension histograms.
Finding and filling up missing values.
Substituting "filler" values.
Reshaping arrays.
Removing parts of arrays.
Removing parts of individual elements within arrays. (Stripping)
Sorting and Shuffling.
Argument Functions.
Stacking and Concatenating.
Finding the unique values within an array.
A comprehensive practical example of data cleaning and preprocessing.

Course content

9 sections • 77 lectures • 6h 44m total length

What Does the Course Cover?5:13
Download All Resources0:15
FAQ9:46
The NumPy Package and Its Applications4:03
Explore NumPy, the Python package for data analysis, focusing on stable and efficient array operations, data import and pre-processing, vector and matrix math, random data generation, and universal functions.
The NumPy Package and Its Applications
Installing and Upgrading NumPy1:51
What is an array?3:06
Explore how numpy arrays represent data as n-dimensional arrays, from zero to two dimensions, and how vectors and matrices enable elementwise operations in Python.
Using the NumPy Documentation4:47
Explore how to navigate the NumPy documentation to learn function syntax, parameters and default values, returns, and examples, and use the quickstart guides and tab-completion in notebooks for references.
Introduction to NumPy - Exercise0:14

Basic Slicing10:04
Stepwise Slicing4:58
Explore stepwise slicing in NumPy, using step parameters to pick every other row and column, including negative steps to traverse backwards, with a three by five matrix example.
Conditional Slicing4:51
Dimensions and the Squeeze Function6:52
Explore how dimensionality affects a matrix and learn to use the squeeze function (or numpy.squeeze) to produce consistent shapes—from scalars to vectors to single-value matrices—for reliable downstream computations.
Working with Arrays - Exercise0:14

Empty Arrays, Arrays of Identical Values5:32
_like Functions3:13
Explore NumPy like functions that mirror an input array’s shape and type to create output arrays. Demonstrate empty like and zeros like using a 3x5 matrix and outline practical uses.
A Sequence of Numbers - np.arange()5:02
Random Generators and Seeds5:21
Random Integers, Probabilities and Choices3:56
Random Probability Distributions5:19
Generate arrays of random data using numpy to simulate Poisson, binomial, and logistic distributions, with seed control and parameters like lambda, trials, and probability.
Applications of Random Generators4:02
Generating Data with NumPy - Exercise0:15

Importing Data with Numpy - np.loadtxtx() vs np.genfromtxt()10:32
Compare np.loadtxt and np.genfromtxt for importing text data with numpy, demonstrating loading vs generating from text, handling missing values, delimiters, and practical notebook use.
Importing Data with NumPy - Simple Cleaning when Importing7:18
Import data with numpy genfromtxt using simple cleaning to load a subset and save memory; adjust delimiters, skip header and footer, and select columns with unpack options.
Importing Data with NumPy - String vs Object vs Numbers6:54
Importing Data with NumPy - Exercise0:15
Saving Data with NumPy - NPY5:23
Saving Data with NumPy - NPZ5:12
Saving Data with NumPy - CSV4:02
Importing and Saving Data - Exercise0:15

Using NumPy Statistical Functions7:44
Minimal and Maximal Values6:02
Learn how numpy finds minimal and maximal values with min, amin, and amax, including per-element minima, axis-based operations, and using reduce for column-wise results.
Percentiles and Quantiles6:25
Averages and Variance4:17
Covariance and Correlation2:59
Histogram - Part 1: 1-D Histograms7:36
Histogram - Part 2: Higher Dimension Histograms4:15
N-A-N Equivalent Functions3:08
Statistics with NumPy - Exercise0:14

Checking for Missing Values9:23
Substituting Filler Values8:29
Reshaping Arrays6:31
Removing Values4:20
Use numpy's delete function to remove values from an array. Explore dropping elements, rows, and columns by adjusting axis and index arguments, and verify changes with shape and size.
Sorting Arrays9:45
Argument Functions - Part 1: Argument Sort5:48
Argument Functions - Part 1: Argument Where11:12
Shuffling Data6:51
Learn to shuffle a data set in place with NumPy's shuffle, preserving row integrity, using a lending company example, and observe how repeated shuffles produce different results.
Casting Arrays6:14
Explore casting arrays with NumPy, using the astype method to convert data between floats, integers, and strings, and learn how to load and display the dataset while handling conversion errors.
Stripping Symbols from Arrays4:43
Strip text columns in a numpy array to remove prefixes like idy_, product, and location. Cast the results to integers for quantitative analysis.
Stacking Arrays10:31
Concatenating Arrays6:27
Finding Unique Values in Arrays5:04

Setting Up: Introduction to the Practical Example4:50
Join a data analyst in a data science team to clean and preprocess loan data for estimating the probability of default, converting dollars to euros and creating dummy variables.
Setting Up: Importing the Data Set4:10
Inspect the loan data csv with its semicolon delimiter, then import using numpy and configure print options, while handling missing values with genfromtxt and skipping the header.
Setting Up: Checking for Incomplete Data4:35
Setting Up: Splitting the Dataset5:27
Identify string and numeric columns using where, split the dataset into string data and numeric data sets, and prepare headers for separate preprocessing in NumPy.
Setting Up: Creating Checkpoints2:50
Manipulating Text Data: Issue Date5:26
Convert the issue date text into numeric month codes (1–12), replace missing values with zero, and prepare the dataset for numeric casting in NumPy.
Manipulating Text Data: Loan Status and Term7:08
Manipulating Text Data: Grade and Sub Grade8:54
Manipulating Text Data: Verification Status & URL5:20
Manipulating Text Data: State Address6:01
Manipulating Text Data: Converting Strings and Creating a Checkpoint3:28
Cast strings of numbers into integers, selecting 8, 16, or 32 bit types. Create a checkpoint for the transformed data and verify with the array equal function.
Manipulating Numeric Data: Substitute Filler Values7:51
Manipulating Numeric Data: Currency Change – The Exchange Rate6:32
Manipulating Numeric Data: Currency Change - From USD to EUR8:22
Completing the Dataset7:46
Combine numeric and string data after cleaning, align shapes, and horizontally stack them to form a 17-column dataset with a header, then save the preprocessed data for analysis.

Requirements

You'll need to install Python.
No prior experience with NumPy is required.
Some general understanding of coding languages is preferred, but not required.

Description

The problem

Most data analyst, data science, and coding courses miss a crucial practical step. They don’t teach you how to work with raw data, how to clean and preprocess it. This creates a sizeable gap between the skills you need on the job and the abilities you have acquired in training. Truth be told, real-world data is messy, so you need to know how to overcome this obstacle to become an independent data professional.

The bootcamps we have seen online, and even live classes neglect this aspect and show you how to work with ‘clean’ data. But this isn’t doing you a favor. In reality, it will set you back both when you are applying for jobs, and when you’re on the job.

The solution

Our goal is to provide you with complete preparation using the NumPy package. This course will turn you into capable data analyst with a fantastic understanding of one of the most prominent computing packages in the world. To take you there, we will cover the following topics extensively.

· The ndarray class and why we use it

· The type of data arrays usually contain

· Slicing and squeezing datasets

· Dimensions of arrays, and how to reduce them

· Generating pseudo-random data

· Importing data from external text files

· Saving/Exporting data to external files

· Computing the statistics of the dataset (max, min, mean, variance, etc.)

· Data cleaning

· Data preprocessing

· Final practical example

Each of these subjects builds on the previous ones. And this is precisely what makes our curriculum so valuable. Everything is shown in the right order and we guarantee that you are not going to get lost along the way, as we have provided all necessary steps in video (not a single one skipped). In other words, we are not going to teach you how to concatenate datasets before you know how to index or slice them.

So, to prepare you for the long journey towards a data science position, we created a course that will show you all the tools for the job: The Preprocessing Data with NumPy course [MG1] .

We believe that this resource will significantly boost your chances of landing a job, as it will prepare you for practical tasks and concepts that are frequently included in interviews.

NumPy is Python’s fundamental package for scientific computing. It has established itself as the go-to tool when you need to compute mathematical and statical operations.

Why learn it?

A large portion of a data analyst’s work is dedicated to preprocessing datasets. Unquestionably, this involves tons of mathematical and statistical techniques that NumPy is renowned for. What’s more, the package introduces multi-dimensional array structures and provides a plethora of built-in functions and methods to use while working with them. In other words, NumPy can be described as a computationally stable state-of-the-art Python instrument that provides great flexibility and can take your analysis to the next level.

Some of the topics we will cover:

1. Fundamentals of NumPy

2. Random Generators

3. Working with text files

4. Statistics with NumPy

5. Data preprocessing

6. Final practical example

1. Fundamentals of NumPy

To fully grasp the capabilities of NumPy, we need to start from the fundamentals. In this part of the course, we’ll examine the ndarray class, discuss why it’s so popular and get familiar with terms like “indexing”, “slicing”, “dimensions” and “reducing”.

Why learn it?

As stated above, NumPy is the quintessential package for scientific computing, and to understand its true value, we need to start from its very core – the ndarray class. The better we comprehend the basics, the easier it’s going to be to grasp the more difficult concepts. That’s why it’s fundamental to lay a good foundation on which to build our NumPy skills.

2. Random Generators

After we’ve learned the basics, we’ll move on to pseudo-random data and random generators. These generators will help construct a set of arbitrary variables from a given probability distribution, or a fixed set of viable options.

Why learn it?

Working in a data-driven field, we sometimes need to construct partially arbitrary tests to see if our code works as intended. And here lies the value of random generators, as they allow us to construct datasets of pseudo-random data. The added benefit of random generators is that we can set a seed if we wish to replicate a particular randomization, but we’ll go into all the details in the course itself.

3. Working with text files

Exchanging information with text files is practically how we exchange information today. In this part of the course, we will use the Python, pandas, and NumPy tools covered earlier to give you the essentials you need when importing or saving data.

Why learn it?

In many courses, you are just given a dataset to practice your analytical and programming skills. However, we don’t want to close our eyes to reality, where converting a raw dataset from an external file into a workable Python format can be a massive challenge.

4. Statistics with NumPy

Once we’ve learned how to import large sets of information from external text files, we’ll finally be ready to explore one of NumPy’s strengths – statistics. Since the package is extremely computationally durable, we often rely on its functions and methods to calculate the statistics of a sample dataset. These include the likes of the mean, the standard deviation, and much more.

Why learn it?

To become a data scientist, you not only need to be able to preprocess a dataset, but also to extract valuable insights. One way to learn more about a dataset is by examining its statistics. So, we’ll use the package to understand more about the data and how to convert this knowledge into crucial information we can use for forecasting.

5. Data preprocessing

Even when your dataset is in clean and comprehensible shape, it isn’t quite ready to be processed for visualizations and analysis just yet. There is a crucial step in between, and that’s data preprocessing.

Why learn it?

Data preprocessing is where a data analyst can demonstrate how good or great they are at their job. This stage of the work requires the ability to choose the right statistical tool that will improve the quality of your dataset and the knowledge to implement it with advanced pandas and NumPy techniques. Only when you’ve completed this step can you say that your dataset is preprocessed and ready for the next part, which is data visualization.

6. Practical example

The course contains plenty of exercises and practical cases. What’s more, in the end, we have included a comprehensive practical example that will show you how everything you have learned along the way comes nicely together. This is where you will be able to appreciate how far you have come in your journey on mastering NumPy in your pursuit of a data career.

What you get

· Active Q&A support

· All the NumPy knowledge to become a data analyst

· A community of aspiring data analysts

· A certificate of completion

· Access to frequent future updates

· Real-world training

Get ready to become a NumPy data analyst from scratch

Why wait? Every day is a missed opportunity.

Click the “Buy Now” button and become a part of our data analyst program today.

Who this course is for:

Aspiring data analysts.
Programming beginners.
People interested in analyzing data through Python.
Analysts who wish to specialize in Python.
Finance graduates and professionals who need to better apply their knowledge in Python.

Preprocessing Data with NumPy

What you'll learn

Explore related topics

Course content

Introduction to NumPy8 lectures • 29min

Why Do We Use NumPy?4 lectures • 20min

NumPy Fundamentals7 lectures • 29min

Working with Arrays5 lectures • 27min

Generating Data with NumPy8 lectures • 33min

Importing and Saving Data8 lectures • 40min

Statistics with NumPy9 lectures • 43min

Manipulation Data with NumPy13 lectures • 1hr 35min

A NumPy Practical Example15 lectures • 1hr 29min

Requirements

Description

Who this course is for: