Data Science & Real World Computing with Jupyter Notebook
3.8 (11 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
124 students enrolled

Data Science & Real World Computing with Jupyter Notebook

Gain hands-on experience in data analysis and visualization with Jupyter Notebook
3.8 (11 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
124 students enrolled
Created by Packt Publishing
Last updated 12/2018
English [Auto-generated]
Current price: $139.99 Original price: $199.99 Discount: 30% off
5 hours left at this price!
30-Day Money-Back Guarantee
This course includes
  • 8.5 hours on-demand video
  • 1 downloadable resource
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
Training 5 or more people?

Get your team access to 4,000+ top Udemy courses anytime, anywhere.

Try Udemy for Business
What you'll learn
  • Understand why Jupyter Notebooks are a perfect fit for your data science, data manipulation and visualization tasks
  • Perform scientific computing and data analysis tasks with Jupyter
  • Combine the power of R and Python 3 with Jupyter to create dynamic notebooks
  • Create interactive dashboards and dynamic presentations
  • Visualize data and create interactive plots in Jupyter Notebook
  • Work with the most widely used libraries for data analysis: matplotlib, Seaborn, Bokeh, Altair
Course content
Expand all 88 lectures 08:28:38
+ Jupyter for Data Science
39 lectures 02:59:57

This video will give you an overview about the course.

Preview 04:34

Jupyter is available as a web application for a wide variety of platforms. This video covers the details of the Jupyter user interface: what objects it works with and what actions can be taken by Jupyter.

  • Look at the Jupyter user interface

  • Perform actions with Jupyter

Jupyter User Interface

In this video, we will see different menu choices on the menu bar.

  • Study the menu choices

Jupyter’s Menu Choice

In this video, we will see several examples taken from current industry focus and apply them in Jupyter to ensure its utility. This video will explain European call option valuation and Monte Carlo pricing. We will also look at gambling for betting analysis.

  • Implement Monte Carlo pricing

  • Determine probability of series in coin flip

Real Life Examples – Finance and Gambling

This video is all about real time examples such as insurance and consumer product. We are using R to see pricing for non-life products and marketing effectiveness.

  • Implement non-life insurance pricing

  • Look at effectiveness of different and campaigns for grape fruit juice

Real Life Examples – Insurance and Consumer Products

The predominant Jupyter hosting product currently is JupyterHub. It provides multi-user access to your this video, we will install JupyterHub. We will also see Jupyter hosting.

  • Install JupyterHub

  • Access JupyterHub installation

Installing JupyterHub

Optimizations cover a gamut of options running from language-specific issues to deploying your notebook in a highly available environment. Optimizations are script language dependent.

  • Use timeit() to determine execution time

  • Use profiler to give complete rundown of the execution

Optimizing Python Script

R also has tools available that will help pinpoint performance issues with your R coding such as Microbenchmark, modify a function used frequently. optimize name lookup, optimize data frame value extraction, R implementation and Change algorithm.

  • Use microbenchmark to profile R script

  • Look at caching the notebook concept

Optimizing R Scripts

Securing a notebook can be accomplished by several methods such as: Manage authorization and Securing notebook content.

  • Study issues with standard content in Jupyter

  • Look at the techniques to overcome security issues

Securing a Notebook

Python has several groups of processing functions that can tax computer system power. In this video, we will use NumPy function that is a package in Python providing multidimensional arrays and routines for array processing.

  • Use NumPy function in jupyter

Heavy-Duty Data Processing Functions in Jupyter

Pandas is an open source library of high-performance data analysis tools available in Python. We will see functions like read text files, read Excel files, read from SQL database and operate on data frames.

  • Use pandas to read text files and excel files

  • Use pandas to work with dataframe

Using Pandas in Jupyter

SciPy is an open source library for mathematics, science and, engineering. We will see many areas that can be explores using SciPy such as Integration, Optimization, Interpolation, Fourier transforms and Linear algebra.

Using SciPy in Jupyter

There are more functions built-in for working with data frames that we have used so far. If we were to take one of the data frames, we could use additional functions to help portray and work with the dataset.

  • Use slicing to expand the panda data frame

Expanding on Panda DataFrames

Data frames automatically allow you to easily sort and filter the dataset involved, using existing functionality within the data frames themselves.

  • Implement filtering based on certain criteria

  • Implement sorting data frame by index

Sorting and Filtering DataFrames

scikit-learn is a machine learning toolset built using Python. In scikit-learn, an estimator provides two functions, fit() and predict(), providing mechanisms to classify data points and predict classes of other data points, respectively.

  • Implement prediction model using scikit learn

Making a Prediction Using scikit-learn

In this video, we will make a prediction using R. The functions are different for the different language, but the functionality is very close.

  • Build prediction model using R

Making a Prediction Using R

There is a Python package, Bokeh, that can be used to generate a figure in your notebook where the user can interact and change the figure. In this video, I am using the same data from the gridplot example to display an interactive Bokeh gridplot.

  • Plot a graph using Plotly

  • Create a human density map

Interactive Visualization and Plotting

In this video, we will gather one of the datasets and produce a histogram from the data.

  • Plot 3D data using car dataset

Drawing a Histogram of Social Data

Spark is a fast, general engine for large-scale data processing. The SparkContext initializes all of Spark and sets up any access that may be needed to Hadoop, if you are using that as well.

  • Analyze number of lines in a file using spark

Using Spark to Analyze Data

Spark exposes many SQL-like actions that can be taken upon a data frame. In our example we will start a Spark Session, uses the session to read a CSV formatted file, that contains a header record and finally displays initial rows.

  • Use Spark SQL to determine product list

Using SparkSession and SQL

We will combine data frames, operate on it resulting set, import JSON data, and manipulate it with Spark.

  • Populate the data frames and move it to spark

Combining Datasets

Spark can also access JSON data for manipulation. In this video, we will also see a pivot() that allows you to translate rows into columns while performing aggregation on some of the columns.

  • Read and load the JSON in Spark

  • Use pivot() for translation from row to column

Loading JSON into Spark

To get a flavor of the resources available to R developers, we can look at the 2016 election data.

  • Setting up R for Jupyter

  • Display information about the data frame

Analyzing 2016 US Election Demographics

In this video, we will look at voter registration versus actual voting using census data.

  • Display information to visually check for accurate loading

  • Display the characteristics of the regression line

Analyzing 2016 Voter Registration and Voting

In this video, we can look at trends in college admissions acceptance rates over the last few years.

  • Create a vector of the average acceptance rates for colleges

  • Convert the vector points into a time series

Analyzing Changes in College Admissions

In this video, we will look at the airline arrival and departure times versus scheduled arrival and departure times.

  • Build our model of the arrival time

  • Use the testing set to make predictions

  • Plot the predicted versus actual data

Predicting Airplane Arrival Time

In this video, we will walk through the process of reading a CSV and adjusting the dataset to arrive at some conclusions about the data.

  • Change the column names to be more readable

  • Get rough statistics on the data

  • Plot some dominant data points

Reading a CSV File

In this video, we will use dplyr package against the baseball player statistics we used earlier.

  • Convert a data frame to a dplyr table

  • Filter rows in a data frame

  • Add a column to a data frame

Manipulating Data with dplyr

The tidyr package is available to clean up/tidy your dataset. In this video, we will rearrange our data to mix columns and rows with values.

  • Use the standard example of stock price for a date

  • Use the spread() function to separate out the values into multiple columns

  • Reorganize the data by listing all prices for a stock per row

Tidying Up Data with tidyr

In this video, we will look to display glyphs at different points in a graph rather than the standard dot as the glyph should provide more visual information to the viewer.

  • Display glyph data about the standard iris dataset

  • Derive some information from the glyph data

Visualizing Glyph Ready Data

You can publish a notebook/dashboard using markdown. Markdown involves adding annotations to cells in your notebook that are interpreted by Jupyter and converted into the more standard HTML representations that you see in other published materials. In this video, we will see the different kinds of markdowns.

  • Create cell with the markdown type

  • Generate table in HTML

Publishing a Notebook

Shiny is a web application framework for R. The Shiny server code set deals with accessing data, computing results, obtaining direction from the user and interacting with other server code set to changes results. In this video, we will learn how to create a shiny dashboard.

  • Load the Shiny dashboard library

  • Publish your dashboard

Creating a Shiny Dashboard

Using Node.js, developers have come up with a way to host your dashboard/notebook without Jupyter on jupyter-dashboard-server.

  • Install conda

  • Install the layout extension

Building Standalone Dashboards

In this video, we will use the Yelp data and use the dataset from round 9 of the challenge.

  • Download the JSON file and upload it on Jupyter notebook

  • Display the date and time of our system

  • Create a reviews.csv

Converting JSON to CSV

In this video, we will build a computed data frame with two columns and display the top-rated business dataset. Also, we will visualize the relationship between ratings and number of reviews for companies.

  • Find the top-rated firms

  • Build a model of reviews

Evaluating Yelp Reviews

Naive Bayes is an algorithm that uses probability to classify the data according to Bayes theorem for the strong independence of the features. Bayes theorem estimates the probability of an event based on prior conditions. So, overall, we will use a set of feature values to estimate a value assuming the same conditions hold true when those features have similar values. Also, we will implement naive Bayes using the R programming language.

  • Install the package and load the library

  • Measure the accuracy of the model

  • Determine the accuracy of the model

Naive Bayes

Using the nearest neighbor, we will have an unclassified object and a set of objects that are classified. In this video, we will take the attributes of the unclassified object, compare against the known classifications in place and select the class that is closed to our unknown.

  • Reorder the data in ascending order

  • Split the data into a training set and a test set

  • Use the same example and implement the nearest neighbor using Python

Nearest Neighbor Estimator

In this video, we will use decision trees to predict values. A decision tree has a logical flow where the user makes a decision based on attributes the tree down to a root level where a classification is provided.

  • Load the libraries to use rpart

  • Develop a model to predict mpg acceptability based on the other factors

  • Perform the same analysis in Python

Decision Trees

With a neural net, we will end up with a graphical model that provide the factors to apply to each input to arrive at our housing price. Also, we will use the random forest algorithm which attempts many random decision trees and provide the tree that works best within the parameters used to drive the model.

  • Calculate our neuralnet model

  • Include the packages of random forest in R

Neural Networks and Random Forests
Test Your Knowledge
4 questions
+ Jupyter Notebook for Data Science
20 lectures 03:11:05

This video provides an overview of the entire course.

Preview 03:56

In this video, we will show how to install a Jupyter Notebook environment on your machine.

  • Cover the ways of installing a Jupyter Notebook

  • Show how to install Docker

  • Show how to use the Jupyter Notebook Data Science Docker stack

Setting Up Jupyter Notebook

In this video, we will show you how to work with Jupyter Notebooks.

  • Show how to navigate cells

  • Show how the documentation is read and shell code accessed

  • Show how to work with a sample notebook for analyzing life expectancies

Using Jupyter Notebook

In this video, we explain how to publish finished Jupyter Notebooks.

  • Explain the different notebook formats

  • Show how some of these formats can be obtained

  • Export the example notebook

Publishing Notebooks

In this section, we examine the Chicago crime dataset and show how to download and import it using Pandas.

  • Explain and download the Chicago crime dataset

  • Examine the dataset format in Jupyter Notebook

  • Choose the necessary options to successfully read it as a Pandas DataFrame

Parsing the Crime Dataset

We will examine the core data structures available in Pandas.

  • Examine the 1D Series data structure

  • Examine the 2D DataFrame data structure

  • Explore the Pandas API for said data structures in a Jupyter Notebook

Pandas Data Structures

In this video, we will learn about Pandas hierarchical indexes and apply them to visually explore the crime dataset.

  • Examine the Pandas MultiIndex for hierarchically indexed data

  • Show a MultiIndex example in Jupyter Notebook

  • Use a MultiIndex to restructure the crime dataset and visualize it

Exploring and Visualising the Data

We explain how to add basic interactivity to a Jupyter Notebook.

  • Explain what interactive widgets are

  • Create an example interactive widget using our crimes dataset

  • Show where to find more examples of interactive widgets

Creating an Interactive Widget

In this video, we will learn what scraping is and why it's important.

  • Explain what unstructured data is

  • Explain the different data formats and their differences: CSV, Excel, REST APIs, plain websites, scanned PDFs...

Introduction to Data Scraping

This video will teach you how to scrape data from a REST API.

  • Explain the weather API

  • Show how to set up the API key to download the data

  • Cover user requests to fetch data from a REST API

Fetching Data from a REST API Using Requests

This video takes the last example further to import the downloaded REST data into pandas.

  • Show how to convert a Python dict provided to us by Requests into a pandas DataFrame

  • Show how to iterate over multiple API requests to download all the data chunks

  • Combine data chunks into a singular Chicago weather DataFrame

Importing API Data into Pandas

In this video, we will show a more difficult example of scraping data from an unstructured website.

  • Show the website we will be using to fetch the Chicago weather data

  • Show how to use BeautifulSoup to download the website and parse the HTML

  • Show how to convert the parsed HTML object into a pandas DataFrame

Scraping Websites Using BeautifulSoup

In this video, we will learn what information-dense visualisations are.

  • Explain data visualisation as visual storytelling

  • Talk about Edward Tufte's books and website

  • Explain Charles Joseph Minard's excellent map

Introduction to Information-Dense Visualisations

This section explains how to visualise scatter plots for examining data correlation.

  • Explain time series components

  • Show how to plot a scatter plot

  • Explain data correlation a bit better

Vizualising Data Correlation

This video takes the last example further to import the downloaded REST data into pandas.

  • Explain what linear regression is

  • Explain how modeling real-world behavior relates to general scientific research

  • Show how to create a linear model using linear regression in Python

Linear Regression

In this video, we will show a more difficult example of scraping data from an unstructured website.

  • Explain why correlation matrices are useful

  • Show how to create a correlation matrix in Python

Correlation Matrix

See why maps are helpful.

  • Talk about spatial data

  • Talk about John Snow's London cholera outbreak map and how it was helpful

  • Mention I Quant NY's visual storytelling

Maps in Data Science

See how we can build a map from our dataset.

  • Explain how data layers can be overlaid on maps

  • Show how to use Basemap to tile a map

  • Show how to zoom into a specific area of the map and overlay the data there

Plotting Crime Locations

In this section, we talk about adding interactivity to our map using Plotly.

  • Set up Plotly/Mapbox API keys

  • Show how to draw points on the Plotly map

  • Show how to render roads on a Plotly map

Interactive Maps Using Plotly

Closing words for the course.

  • Summarize what was learned

  • Suggest some possible next steps for the viewer

  • Instructions for feedback

Final Remarks
Test Your Knowledge
5 questions
+ Interactive Computing with Jupyter Notebook
29 lectures 02:17:36

This video gives you a glimpse of what this course offers to you.

Preview 04:27

To get well versed with a new tool, it is a good practice to start up with a basic tour and perform some basic and frequently used operations. This video is that quick step in using the Jupyter Notebook and IPython commands.

  • Create a new Jupyter notebook using an IPython kernel

  • Perform basic mathematical operations

  • Learn to use magic commands

Introducing IPython and the Jupyter Notebook

This video will give you an introduction to IPython and Jupyter for data analysis.

  • Import the scientific packages: NumPy, pandas, and Matplotlib

  • Use the groupby() method group the table elements by the weekday

  • Plot a smoothed version of the track attendance as a function of time

Getting Started with Exploratory Data Analysis in the Jupyter Notebook

NumPy is the main foundation of the scientific Python ecosystem. This library offers a specific data structure for high-performance numerical computing: the multidimensional array. This video will illustrate the basic concepts of the multidimensional array.

  • Import the built-in random Python module and NumPy

  • Generate two Python lists, x and y, each one containing 1 million random numbers between 0 and 1

  • Compute the arithmetic distance between any pair of numbers in our two lists

Introducing the Multidimensional Array in NumPy for Fast Array Computations

Although IPython comes with a wide variety of magic commands, there are cases where we need to implement custom functionality in new magic commands. In this video, we will show how to create line and magic cells, and how to integrate them in an IPython extension.

  • Create a function that accepts the contents of the line and decorate this function with @register_ line_magic

  • Create %%csv cell magic that parses a CSV string and returns a pandas DataFrame object

  • Create an extension module and import it into the IPython session with the %load_ext magic command

Creating an IPython Extension with Custom Magic Commands

This video will let you take a step ahead and explore the architecture of Jupyter Notebook and also show you how to connect a new client (such as a Qt console) to the underlying kernel.

  • Connect multiple clients to one kernel

Architecture of the Jupyter Notebook

This video will show you how to manipulate the contents of a notebook (which is just a plain-text JSON file) directly with Python, and how to convert it to other formats with nbconvert.

  • Load the notebook in a string and parse it with the json module

  • Count the number of Markdown and code cells

  • Convert the text notebook to HTML using nbconvert and display this document in an <iframe>

Converting a Jupyter Notebook to Other Formats with nbconvert

The ipywidgets package provides many common user interface controls for exploring code and data interactively. These controls can be assembled and customized to create complex graphical user interfaces. In this video, we introduce the various ways we can create user interfaces with ipywidgets.

  • Create a simple user interface for controlling four parameters of a function that displays a plot

  • Create a slider for selecting pairs of numbers

  • Create the Tab instance, set the titles of the tabs, and add the plot button below the tabs

Mastering Widgets in the Jupyter Notebook

The ipywidgets packages provides many built-in control widgets to interact with code and data in the Jupyter Notebook. This video will walk you through those steps which will let you build a custom interactive widget from scratch, using Python on the kernel side, and HTML/JavaScript on the client side (frontend).

  • Create a CounterWidget class deriving from DOMWidget

  • Display the widget

Creating Custom Jupyter Notebook Widgets in Python, HTML, and JavaScript

Many aspects of the Jupyter Notebook can be configured. In this video, we will show you how to configure the Jupyter application and the Jupyter Notebook frontend.

  • Check whether the Jupyter Notebook configuration file already exists

  • Inspect the contents of the notebook configuration the JSON file

  • Get and change the frontend options from Python

Configuring the Jupyter Notebook

The %timeit magic and the %%timeit cell magic allow us to quickly evaluate the time taken by one or several Python statements. Let us take a step ahead to see the methods for more extensive profiling. In this video, we will estimate the time taken to calculate the sum of the inverse squares of all positive integer numbers up to a given n.

  • Define a variable and time the computation in pure Python

  • Use the %%timeit cell magic to time the same computation

  • Time the NumPy version of this computation

Evaluating the Time Taken by a Command in IPython

How could you break down the execution time into the contributions of all called functions? cProfiler is a solution to this problem. This video will walk you through simple step to implement this profiler.

  • Create a function generating random +1 and -1 values in an array

  • Write simulation code with %%prun

Profiling Your Code Easily with cProfile and IPython

Python's native cProfile module and the corresponding %prun magic break down the execution time of code function by function. Sometimes, we may need an even more finegrained analysis of code performance with a line-by-line report. Let’s see how to do this.

  • Import NumPy and the line_profiler IPython extension module

  • Write the code in a Python script using the %%writefile cell magic

  • Execute the function under the control of the line profiler

Profiling Your Code Line-by-Line with line_profiler

In this video, we will look at a simple memory profiler unsurprisingly named memory_profiler. Its usage is very similar to line_profiler, and it can be conveniently used from IPython.

  • Load the memory_profiler IPython extension and define a function that allocates big objects

  • Run the code under the control of the memory profiler

  • Display the result

Profiling the Memory Usage of Your Code with memory_profiler

This video, we will show you how to avoid unnecessary array copies in order to save memory. In that respect, we will need to dig into the internals of NumPy.

  • Check whether two arrays share the same underlying data buffer in memory

  • Use the flatten() and the ravel() methods

Understanding the Internals of NumPy to Avoid Unnecessary Array Copying

Sometimes, we need to deal with NumPy arrays that are too big to fit in the system memory. A common solution is to use memory mapping and implement out-of-core computations. Let’s see how to implement these in our code.

  • Create a memory-mapped array in write mode

  • Feed the array with random values

  • Save the last column of the array and flush memory changes to disk

Processing Large NumPy Arrays with Memory Mapping

The first way to make Python code run faster is to know all features of the language. This video will show you how badly-written Python code can be significantly improved when using all the features of the language.

  • Define a list of normally-distributed random variables, using the random built-in module

  • Write a function that computes the sum of all numbers in that list.

  • Write a slightly improved version of this code

Using Python to Write Faster Code

This video will show you how to accelerate pure Python code generating a Mandelbrot fractal. Let’s go ahead and do it right now!

  • Create a function that generates a fractal in pure Python

  • Run the simulation and display the fractal

  • Accelerate this function using Numba

Accelerating Pure Python Code with Numba and Just-In-Time Compilation

NumExpr evaluates algebraic expressions involving arrays, parses them, compiles them, and finally executes them, possibly on multiple processors. We will see how that works in this recipe.

  • Import NumPy and NumExpr and generate three large vectors

  • Evaluate the time taken by NumPy to calculate a complex algebraic expression involving our vectors

  • Perform the same calculation with NumExpr

Accelerating Array Computations with NumExpr

Performance gains are most significant in CPU-bound programs, notably in tight Python loops. By contrast, I/O bound programs are not expected to benefit much from a Cython implementation. In this video, we will see how to accelerate the Mandelbrot code example with Cython.

  • Import the Cython Jupyter extension

  • Add the %%cython magic before the definition of the mandelbrot() function

  • Add type information using typed memory views for NumPy arrays

Accelerating Python Code with Cython

With Cython, we have a way to release the GIL temporarily in a portion of the code in order to enable multi-core computing. This is done with OpenMP, a multiprocessing API that is supported by most C compilers. In this video, we will see how to parallelize the previous code on multiple cores.

  • Import the prange() function

  • Add nogil after each function definition in order to remove the GIL

  • Run a loop in parallel over the cores with OpenMP, using prange()

Releasing the GIL to Take Advantage of Multi-Core Processors

Let’s take the next step to implement the embarrassingly parallel computation of the Mandelbrot fractal in CUDA using Numba.

  • Import the packages and check whether Numba correctly identified our GPU

  • Execute the GPU function, passing the empty array

  • Send the NumPy array to the GPU with the cuda.to_device() function

Writing Massively Parallel Code for NVIDIA Graphics Cards (GPUs)

This video will walk you through the usage of ipyparallel, which offers an even simpler interface that brings powerful parallel computing features in an interactive environment.

  • Launch four IPython engines in separate processes

  • Create a client that will act as a proxy to the IPython engines and check the number of running engines

Distributing Python Code Across Multiple Cores with IPython

This video will show you how to interact with asynchronous tasks running in parallel with ipyparallel.

  • Create a client and a load-balanced view on the IPython engines

  • Create a simple progress bar for our asynchronous tasks

Interacting with Asynchronous Parallel Tasks in IPython

How can we have data structures resembling NumPy arrays (dask.array) and Pandas DataFrames (dask.dataframe) that efficiently scale to huge datasets. How can you split a large array into smaller arrays (chunks). This video will provide a solution to this problems.

  • Initialize a large 10,000 x 10,000 array with random values using dask

  • Profile the memory usage and time of the same computation using either NumPy or dask.array

  • Use multiple cores to perform computations on large Arrays and create a client using dask.distributed

Performing Out-of-Core Computations on Large Arrays with Dask

Recent versions of Matplotlib have significantly improved the default style of its figures. Today, Matplotlib comes with a set of high-quality predefined styles along with a styling system that lets one customize all aspects of these styles. This video will let you explore these styles.

  • Import the libraries and check the list of all available styles

  • Create a plot and set a style with

  • Change the style for a given plot using the context manager syntax

Using Matplotlib Styles

Seaborn is a library that builds on top of Matplotlib and Pandas to provide easy-to-use statistical plotting routines. In this video, we give a few examples, adapted from the official documentation, of the types of statistical plot that can be created with seaborn.

  • Import NumPy, Matplotlib, and seaborn

  • Plot the histogram, Kernel Density Estimation (KDE), and a gamma distribution fit for the dataset

  • Display a bar plot, a violin plot, and a swarm plot that show an increasing amount of details

Creating Statistical Plots Easily with Seaborn

In this video, we will give a few short examples of interactive Bokeh figures in the Jupyter Notebook. We will also introduce HoloViews, which provides a high-level API for Bokeh and other plotting libraries.

  • Import the packages NumPy, Bokeh, and HoloViews

  • Create a scatter plot of random data

  • Use Pandas to plot the hourly average temperature

Creating Interactive Web Visualizations with Bokeh and HoloViews

This video will show you the use of Vega, which is a declarative format for designing static and interactive visualizations. Along with we will learn to use Altair that provides a simple API to define and display Vega-Lite Visualizations.

  • Load the flights-5k dataset

  • Create a scatter plot showing the delay as a function of the date

  • Create a bar plot with the average delay of all flights departing from Los Angeles

Creating Plots with Altair and the Vega-Lite Specification
Test Your Knowledge
4 questions
  • Some programming experience with R or Python and some basic understanding of Jupyter is all you need to get started on this course.

Jupyter Notebook is a web-based environment that enables interactive computing in notebook documents. It allows you to create documents that contain live code, equations, and visualizations as it is also a powerful tool for interactive data exploration, visualization and has become the standard tool among data scientists.

This course is a step-by-step guide to exploring the possibilities in the field of Jupyter. You will first get started with data science to perform various task such as data exploration to visualization, using the popular Jupyter Notebook, along with this you will also learn how Python 3, R, and Julia can be integrated with Jupyter for various data science. Then you will learn data analysis tasks in Jupyter Notebook and work our way up to learn some common scientific Python tools such as pandas, matplotlib, plotly & work with some real datasets. Along with this, you will also learn to create insightful visualizations, showing time-stamped and spatial data. Finally, you will master relatively advanced methods in interactive numerical computing, high-performance computing, and data visualization.

By the end of this course, you will comfortably leverage the power of Jupyter to perform various data science tasks efficiently.

Contents and Overview

This training program includes 3 complete courses, carefully chosen to give you the most comprehensive training possible.

The first course, Jupyter for Data Science gets you started with data science using the popular Jupyter Notebook. If you are familiar with Jupyter Notebook and want to learn how to use its capabilities to perform various data science tasks, this video course is for you! From data exploration to visualization, this course will take you every step of the way in implementing an effective data science pipeline using Jupyter. You will also see how you can utilize Jupyter's features to share your documents and codes with your colleagues. The course also explains how Python 3, R, and Julia can be integrated with Jupyter for various data science tasks. By the end of this course, you will comfortably leverage the power of Jupyter to perform various tasks in data science successfully.

The second course, Jupyter Notebook for Data Science will help you get familiar with Jupyter Notebook and all of its features to perform various data science tasks in Python. Jupyter Notebook is a powerful tool for interactive data exploration and visualization and has become the standard tool among data scientists. In the course, we will start with basic data analysis tasks in Jupyter Notebook and work our way up to learn some common scientific Python tools such as pandas, matplotlib, and plotly. We will work with real datasets, such as crime and traffic accidents in New York City, to explore common issues such as data scraping and cleaning. We will create insightful visualizations, showing time-stamped and spatial data. By the end of the course, you will feel confident about approaching a new dataset, cleaning it up, exploring it, and analyzing it in Jupyter Notebook to extract useful information in the form of interactive reports and information-dense data visualizations.

The third course, Interactive Computing with Jupyter Notebook covers programming techniques: code quality and reproducibility, code optimization, high-performance computing through just-in-time compilation, parallel computing, and graphics card programming. In short, you will master relatively advanced methods in interactive numerical computing, high-performance computing, and data visualization.

About the Authors:    

  • Dan Toomey has been developing applications for over 20 years. He has worked in a variety of industries and companies of all sizes, in roles from sole contributor to VP/CTO level. For the last 10 years or so, he has been contracting companies in the eastern Massachusetts area under Dan Toomey Software Corp. Dan has also written the R for Data Science and Learning Jupyter books for Packt Publishing.

  • Dražen Lučanin is a developer, data analyst, and the founder of Punk Rock Dev, an indie web development studio. He's been building web applications and doing data analysis in Python, JavaScript, and other technologies professionally since 2009. In the past, Dražen worked as a research assistant and did a Ph.D. in computer science at the Vienna University of Technology. There he studied the energy efficiency of geographically distributed data centers and worked on optimizing VM scheduling based on real-time electricity prices and weather conditions. He also worked as an external associate at the Ruđer Bošković Institute, researching machine learning methods for forecasting financial crises. During Dražen's scientific work Python, Jupyter Notebook (back then still IPython Notebook), Matplotlib, and Pandas were his best friends over many nights of interactive manipulation of all sorts of time series and spatial data. Dražen also did a Master's degree in computer science at the University of Zagreb.

  • Cyrille Rossant, Ph.D., is a neuroscience researcher and software engineer at University College London. He is a graduate of École Normale Supérieure, Paris, where he studied mathematics and computer science. He has also worked at Princeton University and Collège de France. While working on data science and software engineering projects, he gained experience in numerical computing, parallel computing, and high-performance data visualization.

He is the author of Learning IPython for Interactive Computing and Data Visualization, Second Edition, Packt Publishing.

Who this course is for:
  • This course is aimed at data analyst, developers, students and professionals keen to master the use of Jupyter to perform a variety of data science tasks. Some programming experience with Python and a basic understanding of Jupyter is required.