Data Processing with Python
4.2 (1,414 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
10,415 students enrolled

Data Processing with Python

Learn how to use Python and Pandas for cleaning and reorganizing huge amounts of data.
4.2 (1,414 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
10,415 students enrolled
Created by Ardit Sulce
Last updated 10/2019
Current price: $13.99 Original price: $19.99 Discount: 30% off
5 hours left at this price!
30-Day Money-Back Guarantee
This course includes
  • 3.5 hours on-demand video
  • 17 articles
  • 17 downloadable resources
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
Training 5 or more people?

Get your team access to 4,000+ top Udemy courses anytime, anywhere.

Try Udemy for Business
What you'll learn
  • Build 10 advanced Python scripts which together make up a data analysis and visualization program.
  • Solve six exercises related to processing, analyzing and visualizing US income data with Python.
  • Learn the fundamental blocks of the Python programming language such as variables, datatypes, loops, conditionals, functions and more.
  • Use Python to batch download files from FTP sites, extract, rename and store remote files locally.
  • Import data into Python for analysis and visualization from various sources such as CSV and delimited TXT files.
  • Keep the data organized inside Python in easily manageable pandas dataframes.
  • Merge large datasets taken from various data file formats.
  • Create pivot tables in Python out of large datasets.
  • Perform various operations among data columns and rows.
  • Query data from Python pandas dataframes.
  • Export data from Python into various formats such as TXT, CSV, Excel, HTML and more.
  • Use Python to perform various visualizations such as time series, plots, heatmaps, and more.
  • Create KML Google Earth files out of CSV files.
Course content
Expand all 50 lectures 03:46:33
+ Getting Started
2 lectures 11:27

You will learn how to install Python through the Anaconda package which is a complete package that will not only install Python into your computer, but also other libraries needed for data analysis and visualizations such as pandas, matplotlib, numpy, scipy, etc.

Preview 08:06

You will learn how to use the Spyder environment to write scripts of Python code and also learn how to use iPython which is an enhanced interactive shell where you type in and execute Python code. iPython is tailored for data analysis applications

Python editors - Spyder and iPython
+ Downloading Many Files with Python
7 lectures 38:16

Short lecture introducing you to this section of the course.

Section introduction

You will learn how to write Python code that establishes a connection to an FTP server and accesses the files of the FTP site.

Navigating through FTP directory trees with Python

You will learn how to use the Spyder editor for executing complete scripts of Python code.

Storing Python code

You will learn how to create a custom FTP function that logs in to an FTP site and generates a list of file names contained in the site.

Creating an FTP function

You will learn the Python code that downloads a single file from an FTP site.

Downloading an FTP file

Something to keep in mind for the next lecture.

About the next lecture

Here we start building our data analysis program.

In this particular lecture, we will build an FTP function that will login to the FTP site, and download a given range of files from the site.

Practice No.1: Creating an FTP File Downloader
+ Extracting Data from Archive Files
3 lectures 11:30

You will learn how to extract various types of archive files using the patool library and the for loop.

Extracting ZIP, TAR, GZ and other archive file formats

You will learn how to extract RAR archive files.

Extracting RAR files

Here you will write a function that will fetch the archive files downloaded by the FTP function and it will extract them all in a local directory.

Practice No.2: Creating a Batch Archive Extractor
+ Working with TXT and CSV Files
8 lectures 20:10

Short lecture introducing you to this section of the course.

Section introduction

You will learn how to easily read CSV and delimited TXT files using the pandas library and use their data inside Python.

Reading delimited TXT and CSV files
Reading Excel files

You will learn how to export data from Python to CSV and TXT files.

Exporting data from Python to files

You will learn how to open data from TXT files which columns are delimited by a certain width.

Reading fixed width TXT files

You will learn how to quickly export a pandas dataframe into an HTML file.

Exporting data back to HTML and other file formats
Data Analysis Exercise 1
Data Analysis Exercise 1: Solution
+ Getting Started with Pandas
4 lectures 12:13

We already used the pandas library in the previous section. Here you will be given an official tour to the pandas data analysis library.

Get started with Pandas

You will create a function that grabs all the TXT files of a folder, opens each of them in Python as dataframes, adds a column in each dataframe and exports the updated dataframes back to CSV files.

Practice No.3: Calculating and Adding Columns to CSV Files
Data Analysis Exercise 2
Data Analysis Exercise 2: Solution
+ Merging Data
8 lectures 19:44

You will write a function that gets all the CSV files and concatenates them vertically using the pandas concatenate function by creating a single CSV containing everything.

Practical No.4: Concatenating multiple CSV files
Data Analysis Exercise 3
Data Analysis Exercise 3: Solution

You will write a function that will join columns of a pandas dataframe to another dataframe.

Practice No. 5: Joining Data Based on a Matching Column
Data Analysis Exercise 4: Solution
Data Analysis Exercise 5
Solution: 5 of 6
+ Data Aggregation
1 lecture 07:41

You will learn how to use the pandas pivot function by creating a pivoted dataframe out of a large CSV file by aggregating the data values.

Practice No. 6: Pivoting Large Amounts of Data
+ Visualizing Data
5 lectures 28:18

You will learn how to use the visualization features available in Python and generate graphs using the matplotlib and the seaborn libraries.

Data visualization with Python

You will expand your knowledge on performing visualizations of different kinds out of pandas dataframes and adding labels and legends to the generated graphs.

Preview 12:23

You will learn create a function that will access the pivoted dataframe and it will generate a graph representing the data, and save the graph inside a PNG image file.

Practice No. 7: Producing Image Files
Data Analysis Exercise 6
Data Analysis Exercise 6: Solution
+ Mapping Spatial Data
2 lectures 12:23

You will learn how to create a point KML file using the simplekml library and display the file in Google Earth.

Programmatically creating KML Google Earth files with Python

You will create a function that grabs the data from a pandas dataframe and creates a KML file using the latitude and the longitude information contained in the dataframe.

Practice No, 8: Creating KML Google Earth fIles from CSV data
+ Putting everything together
6 lectures 22:37

You will learn how to make your script interact with a user who runs it.

User interaction
Exercise: User interaction
Exercise: User interaction: Solution

You will learn how to execute all the functions of the programs in one single click.

Practice No. 9: Polishing the Program, I

You will learn how to make your program more user friendly by integrating the user input functionality.

Practice No. 10: Polishing the Program, II

You will learn how to convert your program into a Python module so you can import it in other scripts.

Practice No. 11: Creating Python Modules
  • A working computer (Windows, Mac, or Linux)
  • No prior knowledge of Python is required

Data scientists spend only 20 percent of their time on building machine learning algorithms and 80 percent of their time finding, cleaning, and reorganizing huge amounts of data. That mostly happen because many use graphical tools such as Excel to process their data. However, if you use a programming language such as Python you can drastically reduce the time it takes for processing your data and make them ready for use in your project. This course will show how Python can be used to manage, clean, and organize huge amounts of data.

This course assumes you have basic knowledge of variables, functions, for loops, and conditionals. In the course you will be given access to a million records of raw historical weather data and you will use Python in every single step to deal with that dataset. That includes learning how to use Python to batch download and extract the data files, load thousands of files in Python via pandas, cleaning the data, concatenating and joining data from different sources, converting between fields, aggregating, conditioning, and many more data processing operations. On top of that, you will also learn how to calculate statistics and visualize the final data. The course also covers a series of exercises where you will be given some sample data then practice what you learned by cleaning and reorganizing those data using Python.

Who this course is for:
  • Those who come from any technology field that deals with any kind of data.
  • Those who want to leverage the power of the Python programming language for handling data.
  • Those who need to learn Python basics and want to quickly advance their skills by learning how to perform data cleaning, analysis and visualization with Python - all in one single course.
  • Those who want to switch from programming languages such as Java, C, R, Matlab, etc. to Python.