Data Science with Jupyter: 2-in-1

Name: Data Science with Jupyter: 2-in-1
Rating: 3.5 (12 reviews)

Get the most out of Jupyter to perform various data science tasks

Created byPackt Publishing

Last updated 4/2018

English

What you'll learn

Get the most out of your Jupyter Notebook to complete the trickiest of tasks in data science
Learn all the tasks in the data science pipeline from data acquisition to visualization and implement them using Jupyter
Create custom extensions and build data widgets using Jupyter Notebook
Perform scientific computing and data analysis tasks with Jupyter
Create interactive dashboards and dynamic presentations
Master the best coding practices and deploy your Jupyter Notebooks efficiently

Course content

2 sections • 60 lectures • 4h 43m total length

The Course Overview4:34
This video will give you an overview about the course.
Jupyter User Interface5:04
Jupyter is available as a web application for a wide variety of platforms. This video covers the details of the Jupyter user interface: what objects it works with and what actions can be taken by Jupyter.

Look at the Jupyter user interface
Perform actions with Jupyter
Jupyter’s Menu Choice8:27
In this video, we will see different menu choices on the menu bar.
Study the menu choices
Real Life Examples – Finance and Gambling3:48
In this video, we will see several examples taken from current industry focus and apply them in Jupyter to ensure its utility. This video will explain European call option valuation and Monte Carlo pricing. We will also look at gambling for betting analysis.

Implement Monte Carlo pricing
Determine probability of series in coin flip
Real Life Examples – Insurance and Consumer Products4:53
This video is all about real time examples such as insurance and consumer product. We are using R to see pricing for non-life products and marketing effectiveness.

Implement non-life insurance pricing
Look at effectiveness of different and campaigns for grape fruit juice
Installing JupyterHub5:03
The predominant Jupyter hosting product currently is JupyterHub. It provides multi-user access to your notebooks.in this video, we will install JupyterHub. We will also see Jupyter hosting.

Install JupyterHub
Access JupyterHub installation
Optimizing Python Script2:54
Optimizations cover a gamut of options running from language-specific issues to deploying your notebook in a highly available environment. Optimizations are script language dependent.

Use timeit() to determine execution time
Use profiler to give complete rundown of the execution
Optimizing R Scripts5:08
R also has tools available that will help pinpoint performance issues with your R coding such as Microbenchmark, modify a function used frequently. optimize name lookup, optimize data frame value extraction, R implementation and Change algorithm.

Use microbenchmark to profile R script
Look at caching the notebook concept
Securing a Notebook5:19
Securing a notebook can be accomplished by several methods such as: Manage authorization and Securing notebook content.

Study issues with standard content in Jupyter
Look at the techniques to overcome security issues
Heavy-Duty Data Processing Functions in Jupyter3:02
Python has several groups of processing functions that can tax computer system power. In this video, we will use NumPy function that is a package in Python providing multidimensional arrays and routines for array processing.

Use NumPy function in jupyter
Using Pandas in Jupyter5:19
Pandas is an open source library of high-performance data analysis tools available in Python. We will see functions like read text files, read Excel files, read from SQL database and operate on data frames.

Use pandas to read text files and excel files
Use pandas to work with dataframe
Using SciPy in Jupyter5:32
SciPy is an open source library for mathematics, science and, engineering. We will see many areas that can be explores using SciPy such as Integration, Optimization, Interpolation, Fourier transforms and Linear algebra.
Expanding on Panda DataFrames2:24
There are more functions built-in for working with data frames that we have used so far. If we were to take one of the data frames, we could use additional functions to help portray and work with the dataset.

Use slicing to expand the panda data frame
Sorting and Filtering DataFrames2:49
Data frames automatically allow you to easily sort and filter the dataset involved, using existing functionality within the data frames themselves.

Implement filtering based on certain criteria
Implement sorting data frame by index
Making a Prediction Using scikit-learn4:29
scikit-learn is a machine learning toolset built using Python. In scikit-learn, an estimator provides two functions, fit() and predict(), providing mechanisms to classify data points and predict classes of other data points, respectively.

Implement prediction model using scikit learn
Making a Prediction Using R3:00
In this video, we will make a prediction using R. The functions are different for the different language, but the functionality is very close.

Build prediction model using R
Interactive Visualization and Plotting6:03
There is a Python package, Bokeh, that can be used to generate a figure in your notebook where the user can interact and change the figure. In this video, I am using the same data from the gridplot example to display an interactive Bokeh gridplot.

Plot a graph using Plotly
Create a human density map
Drawing a Histogram of Social Data4:55
In this video, we will gather one of the datasets and produce a histogram from the data.
Plot 3D data using car dataset
Using Spark to Analyze Data4:07
Spark is a fast, general engine for large-scale data processing. The SparkContext initializes all of Spark and sets up any access that may be needed to Hadoop, if you are using that as well.

Analyze number of lines in a file using spark
Using SparkSession and SQL2:58
Spark exposes many SQL-like actions that can be taken upon a data frame. In our example we will start a Spark Session, uses the session to read a CSV formatted file, that contains a header record and finally displays initial rows.

Use Spark SQL to determine product list
Combining Datasets2:47
We will combine data frames, operate on it resulting set, import JSON data, and manipulate it with Spark.
Populate the data frames and move it to spark
Loading JSON into Spark3:36
Spark can also access JSON data for manipulation. In this video, we will also see a pivot() that allows you to translate rows into columns while performing aggregation on some of the columns.

Read and load the JSON in Spark
Use pivot() for translation from row to column
Analyzing 2016 US Election Demographics4:14
To get a flavor of the resources available to R developers, we can look at the 2016 election data.
Setting up R for Jupyter
Display information about the data frame
Analyzing 2016 Voter Registration and Voting6:01
In this video, we will look at voter registration versus actual voting using census data.
Display information to visually check for accurate loading
Display the characteristics of the regression line
Analyzing Changes in College Admissions6:15
In this video, we can look at trends in college admissions acceptance rates over the last few years.

Create a vector of the average acceptance rates for colleges
Convert the vector points into a time series
Predicting Airplane Arrival Time3:58
In this video, we will look at the airline arrival and departure times versus scheduled arrival and departure times.

Build our model of the arrival time
Use the testing set to make predictions
Plot the predicted versus actual data
Reading a CSV File5:50
In this video, we will walk through the process of reading a CSV and adjusting the dataset to arrive at some conclusions about the data.

Change the column names to be more readable
Get rough statistics on the data
Plot some dominant data points
Manipulating Data with dplyr7:19
In this video, we will use dplyr package against the baseball player statistics we used earlier.
Convert a data frame to a dplyr table
Filter rows in a data frame
Add a column to a data frame
Tidying Up Data with tidyr3:31
The tidyr package is available to clean up/tidy your dataset. In this video, we will rearrange our data to mix columns and rows with values.

Use the standard example of stock price for a date
Use the spread() function to separate out the values into multiple columns
Reorganize the data by listing all prices for a stock per row
Visualizing Glyph Ready Data4:57
In this video, we will look to display glyphs at different points in a graph rather than the standard dot as the glyph should provide more visual information to the viewer.

Display glyph data about the standard iris dataset
Derive some information from the glyph data
Publishing a Notebook4:09
You can publish a notebook/dashboard using markdown. Markdown involves adding annotations to cells in your notebook that are interpreted by Jupyter and converted into the more standard HTML representations that you see in other published materials. In this video, we will see the different kinds of markdowns.

Create cell with the markdown type
Generate table in HTML
Creating a Shiny Dashboard3:58
Shiny is a web application framework for R. The Shiny server code set deals with accessing data, computing results, obtaining direction from the user and interacting with other server code set to changes results. In this video, we will learn how to create a shiny dashboard.

Load the Shiny dashboard library
Publish your dashboard
Building Standalone Dashboards2:58
Using Node.js, developers have come up with a way to host your dashboard/notebook without Jupyter on jupyter-dashboard-server.

Install conda
Install the layout extension
Converting JSON to CSV1:44
In this video, we will use the Yelp data and use the dataset from round 9 of the challenge.
Download the JSON file and upload it on Jupyter notebook
Display the date and time of our system
Create a reviews.csv
Evaluating Yelp Reviews5:53
In this video, we will build a computed data frame with two columns and display the top-rated business dataset. Also, we will visualize the relationship between ratings and number of reviews for companies.

Find the top-rated firms
Build a model of reviews
Naive Bayes4:56
Naive Bayes is an algorithm that uses probability to classify the data according to Bayes theorem for the strong independence of the features. Bayes theorem estimates the probability of an event based on prior conditions. So, overall, we will use a set of feature values to estimate a value assuming the same conditions hold true when those features have similar values. Also, we will implement naive Bayes using the R programming language.

Install the package and load the library
Measure the accuracy of the model
Determine the accuracy of the model
Nearest Neighbor Estimator6:45
Using the nearest neighbor, we will have an unclassified object and a set of objects that are classified. In this video, we will take the attributes of the unclassified object, compare against the known classifications in place and select the class that is closed to our unknown.

Reorder the data in ascending order
Split the data into a training set and a test set
Use the same example and implement the nearest neighbor using Python
Decision Trees5:35
In this video, we will use decision trees to predict values. A decision tree has a logical flow where the user makes a decision based on attributes the tree down to a root level where a classification is provided.

Load the libraries to use rpart
Develop a model to predict mpg acceptability based on the other factors
Perform the same analysis in Python
Neural Networks and Random Forests5:43
With a neural net, we will end up with a graphical model that provide the factors to apply to each input to arrive at our housing price. Also, we will use the random forest algorithm which attempts many random decision trees and provide the tree that works best within the parameters used to drive the model.

Calculate our neuralnet model
Include the packages of random forest in R
Test your knowledge

The Course Overview1:38
This video provides an overview of the entire course.
Setting Up5:26
In this video, we will get the environment running and store configurations for restoration.
Install and configure virtual environments
Install the application and documentation
Running the package tests from the console or Notebook
Jupyter CLI Introduction4:38
In this video, we will see how to give Jupyter command line operations.
Find the core configuration directories
Find available Kernels types
Perform general maintenance
The Jupyter Core Module3:37
In this video, we will see how to explore the Jupyter core package.
Using the package like command line
Inspecting tests
Understanding the core technology
The Jupyter Client5:43
In this video, we will be shown how to explore the Jupyter client package.
Locating and utilizing Kernel configurations
Creating a client instance
Interacting with remote Kernels
The Jupyter Console4:13
In this video, we will see how to explore the Jupyter console.
Magics and Magic documentation
Discuss Notebook data
Connecting to existing Kernels from the console
Generating Configurations from the CLI5:23
This videos guides us how to break out the configuration values and interact with them using ConfigParser and Traitlets config objects.

Generate configuration files
Load a configuration using ConfigParser
Load a configuration using Traitlets
Storing Configurations5:44
The aim of this video is to show how to quickly and easily store configurations in a local or remote database using Pandas and SQLite.

Load Configurations from a config object to Pandas
Build Python from source for SQLite and test it
Transfer data from Pandas to SQLite
Configuration Extras3:56
In this video, we will see overriding configurations and file system monitoring in Jupyter with Python.
Pass config parameters from the CLI at startup
Create a documentation tool for reference
Monitor the file system for changes automatically with Watchdog
Ipyleaflet7:51
In this video, we will create simple maps with Jupyter widget Ipyleaflet.
Extract data to visualize
Plot basic maps
Automate widget interactions and add supplemental data
More Fun with Ipywidgets6:44
In this video, we will do a sample experiment with audio files in Jupyter to showcase Ipywidgets.
Convert arrays to wav files and back again
Altering wav files with NumPy and graphing the results
Link together all the earlier steps with widgets
Using the GitHub API7:11
This video gives a brief tour of the capabilities of the GitHub REST API and GraphQL.
Search for Notebooks by the Jupyter steering council
Search for open issues and graph data with NetworkX
Making use of GraphQL
Utilizing Twitter6:04
The aim of this video is to Obtain actionable intelligence from the Twitter REST API.
Search for tweets with a common query
Search for locations and trends by location
Record data to MongoDB
The Notebook Package3:58
In this video, we will see how to get started with the Notebook package and what are the included tools. We will have a quick look at the workings of an included script to set up SSL in the Jupyter Notebook and the available Notebooks in the documentation.

Inspect the secure_notebook script
Implement a modified script to personalize data
Review the included package Notebooks
Gdrive Custom Content Managers4:42
The Jupyter Drive module allows you to mount a Google drive as a local content source. Using the API Quick Start, a client application is coded for the user to interact with the service from a running notebook.

Install the extension
Set up your Google developer account
Interact with a fully built service object
Customer Bundler Extensions2:57
With a custom extension coded from open source libraries you can securely backup your research.

Enable the Tar and Zip file bundler extensions
Write a simple encrypted message handler
Install and test the bundler application
Custom File Save Hook4:56
How can we automatically back up my work so that we do not lose my code on accident? We can add file save hooks so that our bundler extension is automatically run when the file is saved.

Write a pre-save hook
Write a post-see hook
Install and test the hooks
Custom Request Handlers5:23
We have a really neat widget, can we serve it using the Notebook server? With request handlers, you can link code to URL and host patterns for dynamic content delivery.

Develop a content generator or widget
Define the route and host pattern
Register and test the server extensions
Crafting a Dashboard3:49
This video guides us how to convert Notebooks to dashboards and display them.
Create a Notebook
Markup the Notebook and arrange the grid or report view
Enable the dashboard view through the Jupyter web interface
The Dashboard Server3:16
In this video, we will see what is required to run a dashboard.
Install the Kernel gateway
Install the dashboard server
Deploy the dashboard via the web API
Bokeh Dashboards6:10
In this video, we will understand how Bokeh data applications are ported to the dashboard server.
Create a Bokeh visualization
Test the application
Deploy to the dashboard server
Test your knowledge

Requirements

Some programming experience with Python or R is required

Description

Jupyter has emerged as a popular tool for code exposition and the sharing of research artefacts. It is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Some of its uses includes data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and more. To perform a variety of data science tasks with Jupyter, you'll need some prior programming experience in either Python or R and a basic understanding of Jupyter.

This comprehensive 2-in-1 course teaches you how to perform your day-to-day data science tasks with Jupyter. It’s a perfect blend of concepts and practical examples which makes it easy to understand and implement. It follows a logical flow where you will be able to build on your understanding of the different Jupyter features with every section.

This training program includes 2 complete courses, carefully chosen to give you the most comprehensive training possible.

The first course, Jupyter for Data Science,starts off with an introduction to Jupyter concepts and installation of Jupyter Notebook. You will then learn to perform various data science tasks such as data analysis, data visualization, and data mining with Jupyter. You will also learn how Python 3, R, and Julia can be integrated with Jupyter for various data science tasks. Next, you will perform statistical modelling with Jupyter. You will understand various machine learning concepts and their implementation in Jupyter.

The second course, Jupyter In Depth, will walk you through the core modules and standard capabilities of the console, client, and notebook server. By exploring the Python language, you will be able to get starter projects for configurations management, file system monitoring, and encrypted backup solutions for safeguarding their data. You will learn to build dashboards in a Jupyter notebook to report back information about the project and the status of various Jupyter components.

By the end of this training program, you’ll comfortably leverage the power of Jupyter to perform various data science tasks efficiently.

Meet Your Expert(s):

We have the best work of the following esteemed author(s) to ensure that your learning journey is smooth:

● Dan Toomey has been developing applications for over 20 years. He has worked in a variety of industries and companies of all sizes, in roles from sole contributor to VP/CTO level. For the last 10 years or so, he has been contracting companies in the eastern Massachusetts area under Dan Toomey Software Corp. Dan has also written R for Data Science and Learning Jupyter with Packt Publishing.

● Jesse Bacon is a hobbyist programmer that lives and works in the northern Virginia area. His interest in Jupyter started academically while working through books available from Packt Publishing. Jesse has over 10 years of technical professional services experience and has worked primarily in logging and event management.

Who this course is for:

This Learning Path targets students and professionals keen to master the use of Jupyter to perform a variety of data science tasks.

Data Science with Jupyter: 2-in-1

What you'll learn

Explore related topics

Course content

Jupyter for Data Science39 lectures • 3hr

Jupyter In Depth21 lectures • 1hr 43min

Requirements

Description

Who this course is for: