Udemy
    •  
    •  
    •  
    •  
    •  
    •  
    •  
    •  
Turn what you know into an opportunity and reach millions around the world.
Learn More
Your cart is empty.
Keep shopping
Data Science & Real World Computing with Jupyter Notebook
Rating: 3.7 out of 5(13 ratings)
177 students

Data Science & Real World Computing with Jupyter Notebook

Gain hands-on experience in data analysis and visualization with Jupyter Notebook
Last updated 12/2018
English

What you'll learn

  • Understand why Jupyter Notebooks are a perfect fit for your data science, data manipulation and visualization tasks
  • Perform scientific computing and data analysis tasks with Jupyter
  • Combine the power of R and Python 3 with Jupyter to create dynamic notebooks
  • Create interactive dashboards and dynamic presentations
  • Visualize data and create interactive plots in Jupyter Notebook
  • Work with the most widely used libraries for data analysis: matplotlib, Seaborn, Bokeh, Altair

Course content

3 sections88 lectures8h 28m total length
  • The Course Overview4:34

    This video will give you an overview about the course.

  • Jupyter User Interface5:04

    Jupyter is available as a web application for a wide variety of platforms. This video covers the details of the Jupyter user interface: what objects it works with and what actions can be taken by Jupyter.

    • Look at the Jupyter user interface

    • Perform actions with Jupyter

  • Jupyter’s Menu Choice8:27

    In this video, we will see different menu choices on the menu bar.

    • Study the menu choices

  • Real Life Examples – Finance and Gambling3:48

    In this video, we will see several examples taken from current industry focus and apply them in Jupyter to ensure its utility. This video will explain European call option valuation and Monte Carlo pricing. We will also look at gambling for betting analysis.

    • Implement Monte Carlo pricing

    • Determine probability of series in coin flip

  • Real Life Examples – Insurance and Consumer Products4:53

    This video is all about real time examples such as insurance and consumer product. We are using R to see pricing for non-life products and marketing effectiveness.

    • Implement non-life insurance pricing

    • Look at effectiveness of different and campaigns for grape fruit juice

  • Installing JupyterHub5:03

    The predominant Jupyter hosting product currently is JupyterHub. It provides multi-user access to your notebooks.in this video, we will install JupyterHub. We will also see Jupyter hosting.

    • Install JupyterHub

    • Access JupyterHub installation

  • Optimizing Python Script2:54

    Optimizations cover a gamut of options running from language-specific issues to deploying your notebook in a highly available environment. Optimizations are script language dependent.

    • Use timeit() to determine execution time

    • Use profiler to give complete rundown of the execution

  • Optimizing R Scripts5:08

    R also has tools available that will help pinpoint performance issues with your R coding such as Microbenchmark, modify a function used frequently. optimize name lookup, optimize data frame value extraction, R implementation and Change algorithm.

    • Use microbenchmark to profile R script

    • Look at caching the notebook concept

  • Securing a Notebook5:19

    Securing a notebook can be accomplished by several methods such as: Manage authorization and Securing notebook content.

    • Study issues with standard content in Jupyter

    • Look at the techniques to overcome security issues

  • Heavy-Duty Data Processing Functions in Jupyter3:02

    Python has several groups of processing functions that can tax computer system power. In this video, we will use NumPy function that is a package in Python providing multidimensional arrays and routines for array processing.

    • Use NumPy function in jupyter

  • Using Pandas in Jupyter5:19

    Pandas is an open source library of high-performance data analysis tools available in Python. We will see functions like read text files, read Excel files, read from SQL database and operate on data frames.

    • Use pandas to read text files and excel files

    • Use pandas to work with dataframe

  • Using SciPy in Jupyter5:32

    SciPy is an open source library for mathematics, science and, engineering. We will see many areas that can be explores using SciPy such as Integration, Optimization, Interpolation, Fourier transforms and Linear algebra.

  • Expanding on Panda DataFrames2:24

    There are more functions built-in for working with data frames that we have used so far. If we were to take one of the data frames, we could use additional functions to help portray and work with the dataset.

    • Use slicing to expand the panda data frame

  • Sorting and Filtering DataFrames2:49

    Data frames automatically allow you to easily sort and filter the dataset involved, using existing functionality within the data frames themselves.

    • Implement filtering based on certain criteria

    • Implement sorting data frame by index

  • Making a Prediction Using scikit-learn4:29

    scikit-learn is a machine learning toolset built using Python. In scikit-learn, an estimator provides two functions, fit() and predict(), providing mechanisms to classify data points and predict classes of other data points, respectively.

    • Implement prediction model using scikit learn

  • Making a Prediction Using R3:00

    In this video, we will make a prediction using R. The functions are different for the different language, but the functionality is very close.

    • Build prediction model using R

  • Interactive Visualization and Plotting6:03

    There is a Python package, Bokeh, that can be used to generate a figure in your notebook where the user can interact and change the figure. In this video, I am using the same data from the gridplot example to display an interactive Bokeh gridplot.

    • Plot a graph using Plotly

    • Create a human density map

  • Drawing a Histogram of Social Data4:55

    In this video, we will gather one of the datasets and produce a histogram from the data.

    • Plot 3D data using car dataset

  • Using Spark to Analyze Data4:07

    Spark is a fast, general engine for large-scale data processing. The SparkContext initializes all of Spark and sets up any access that may be needed to Hadoop, if you are using that as well.

    • Analyze number of lines in a file using spark

  • Using SparkSession and SQL2:58

    Spark exposes many SQL-like actions that can be taken upon a data frame. In our example we will start a Spark Session, uses the session to read a CSV formatted file, that contains a header record and finally displays initial rows.

    • Use Spark SQL to determine product list

  • Combining Datasets2:47

    We will combine data frames, operate on it resulting set, import JSON data, and manipulate it with Spark.

    • Populate the data frames and move it to spark

  • Loading JSON into Spark3:36

    Spark can also access JSON data for manipulation. In this video, we will also see a pivot() that allows you to translate rows into columns while performing aggregation on some of the columns.

    • Read and load the JSON in Spark

    • Use pivot() for translation from row to column

  • Analyzing 2016 US Election Demographics4:14

    To get a flavor of the resources available to R developers, we can look at the 2016 election data.

    • Setting up R for Jupyter

    • Display information about the data frame

  • Analyzing 2016 Voter Registration and Voting6:01

    In this video, we will look at voter registration versus actual voting using census data.

    • Display information to visually check for accurate loading

    • Display the characteristics of the regression line

  • Analyzing Changes in College Admissions6:15

    In this video, we can look at trends in college admissions acceptance rates over the last few years.

    • Create a vector of the average acceptance rates for colleges

    • Convert the vector points into a time series

  • Predicting Airplane Arrival Time3:58

    In this video, we will look at the airline arrival and departure times versus scheduled arrival and departure times.

    • Build our model of the arrival time

    • Use the testing set to make predictions

    • Plot the predicted versus actual data

  • Reading a CSV File5:50

    In this video, we will walk through the process of reading a CSV and adjusting the dataset to arrive at some conclusions about the data.

    • Change the column names to be more readable

    • Get rough statistics on the data

    • Plot some dominant data points

  • Manipulating Data with dplyr7:19

    In this video, we will use dplyr package against the baseball player statistics we used earlier.

    • Convert a data frame to a dplyr table

    • Filter rows in a data frame

    • Add a column to a data frame

  • Tidying Up Data with tidyr3:31

    The tidyr package is available to clean up/tidy your dataset. In this video, we will rearrange our data to mix columns and rows with values.

    • Use the standard example of stock price for a date

    • Use the spread() function to separate out the values into multiple columns

    • Reorganize the data by listing all prices for a stock per row

  • Visualizing Glyph Ready Data4:57

    In this video, we will look to display glyphs at different points in a graph rather than the standard dot as the glyph should provide more visual information to the viewer.

    • Display glyph data about the standard iris dataset

    • Derive some information from the glyph data

  • Publishing a Notebook4:09

    You can publish a notebook/dashboard using markdown. Markdown involves adding annotations to cells in your notebook that are interpreted by Jupyter and converted into the more standard HTML representations that you see in other published materials. In this video, we will see the different kinds of markdowns.

    • Create cell with the markdown type

    • Generate table in HTML

  • Creating a Shiny Dashboard3:58

    Shiny is a web application framework for R. The Shiny server code set deals with accessing data, computing results, obtaining direction from the user and interacting with other server code set to changes results. In this video, we will learn how to create a shiny dashboard.

    • Load the Shiny dashboard library

    • Publish your dashboard

  • Building Standalone Dashboards2:58

    Using Node.js, developers have come up with a way to host your dashboard/notebook without Jupyter on jupyter-dashboard-server.

    • Install conda

    • Install the layout extension

  • Converting JSON to CSV1:44

    In this video, we will use the Yelp data and use the dataset from round 9 of the challenge.

    • Download the JSON file and upload it on Jupyter notebook

    • Display the date and time of our system

    • Create a reviews.csv

  • Evaluating Yelp Reviews5:53

    In this video, we will build a computed data frame with two columns and display the top-rated business dataset. Also, we will visualize the relationship between ratings and number of reviews for companies.

    • Find the top-rated firms

    • Build a model of reviews

  • Naive Bayes4:56

    Naive Bayes is an algorithm that uses probability to classify the data according to Bayes theorem for the strong independence of the features. Bayes theorem estimates the probability of an event based on prior conditions. So, overall, we will use a set of feature values to estimate a value assuming the same conditions hold true when those features have similar values. Also, we will implement naive Bayes using the R programming language.

    • Install the package and load the library

    • Measure the accuracy of the model

    • Determine the accuracy of the model

  • Nearest Neighbor Estimator6:45

    Using the nearest neighbor, we will have an unclassified object and a set of objects that are classified. In this video, we will take the attributes of the unclassified object, compare against the known classifications in place and select the class that is closed to our unknown.

    • Reorder the data in ascending order

    • Split the data into a training set and a test set

    • Use the same example and implement the nearest neighbor using Python

  • Decision Trees5:35

    In this video, we will use decision trees to predict values. A decision tree has a logical flow where the user makes a decision based on attributes the tree down to a root level where a classification is provided.

    • Load the libraries to use rpart

    • Develop a model to predict mpg acceptability based on the other factors

    • Perform the same analysis in Python

  • Neural Networks and Random Forests5:43

    With a neural net, we will end up with a graphical model that provide the factors to apply to each input to arrive at our housing price. Also, we will use the random forest algorithm which attempts many random decision trees and provide the tree that works best within the parameters used to drive the model.

    • Calculate our neuralnet model

    • Include the packages of random forest in R

  • Test Your Knowledge

Requirements

  • Some programming experience with R or Python and some basic understanding of Jupyter is all you need to get started on this course.

Description

Jupyter Notebook is a web-based environment that enables interactive computing in notebook documents. It allows you to create documents that contain live code, equations, and visualizations as it is also a powerful tool for interactive data exploration, visualization and has become the standard tool among data scientists.

This course is a step-by-step guide to exploring the possibilities in the field of Jupyter. You will first get started with data science to perform various task such as data exploration to visualization, using the popular Jupyter Notebook, along with this you will also learn how Python 3, R, and Julia can be integrated with Jupyter for various data science. Then you will learn data analysis tasks in Jupyter Notebook and work our way up to learn some common scientific Python tools such as pandas, matplotlib, plotly & work with some real datasets. Along with this, you will also learn to create insightful visualizations, showing time-stamped and spatial data. Finally, you will master relatively advanced methods in interactive numerical computing, high-performance computing, and data visualization.

By the end of this course, you will comfortably leverage the power of Jupyter to perform various data science tasks efficiently.

Contents and Overview

This training program includes 3 complete courses, carefully chosen to give you the most comprehensive training possible.

The first course, Jupyter for Data Science gets you started with data science using the popular Jupyter Notebook. If you are familiar with Jupyter Notebook and want to learn how to use its capabilities to perform various data science tasks, this video course is for you! From data exploration to visualization, this course will take you every step of the way in implementing an effective data science pipeline using Jupyter. You will also see how you can utilize Jupyter's features to share your documents and codes with your colleagues. The course also explains how Python 3, R, and Julia can be integrated with Jupyter for various data science tasks. By the end of this course, you will comfortably leverage the power of Jupyter to perform various tasks in data science successfully.

The second course, Jupyter Notebook for Data Science will help you get familiar with Jupyter Notebook and all of its features to perform various data science tasks in Python. Jupyter Notebook is a powerful tool for interactive data exploration and visualization and has become the standard tool among data scientists. In the course, we will start with basic data analysis tasks in Jupyter Notebook and work our way up to learn some common scientific Python tools such as pandas, matplotlib, and plotly. We will work with real datasets, such as crime and traffic accidents in New York City, to explore common issues such as data scraping and cleaning. We will create insightful visualizations, showing time-stamped and spatial data. By the end of the course, you will feel confident about approaching a new dataset, cleaning it up, exploring it, and analyzing it in Jupyter Notebook to extract useful information in the form of interactive reports and information-dense data visualizations.

The third course, Interactive Computing with Jupyter Notebook covers programming techniques: code quality and reproducibility, code optimization, high-performance computing through just-in-time compilation, parallel computing, and graphics card programming. In short, you will master relatively advanced methods in interactive numerical computing, high-performance computing, and data visualization.

About the Authors:    

  • Dan Toomey has been developing applications for over 20 years. He has worked in a variety of industries and companies of all sizes, in roles from sole contributor to VP/CTO level. For the last 10 years or so, he has been contracting companies in the eastern Massachusetts area under Dan Toomey Software Corp. Dan has also written the R for Data Science and Learning Jupyter books for Packt Publishing.

  • Dražen Lučanin is a developer, data analyst, and the founder of Punk Rock Dev, an indie web development studio. He's been building web applications and doing data analysis in Python, JavaScript, and other technologies professionally since 2009. In the past, Dražen worked as a research assistant and did a Ph.D. in computer science at the Vienna University of Technology. There he studied the energy efficiency of geographically distributed data centers and worked on optimizing VM scheduling based on real-time electricity prices and weather conditions. He also worked as an external associate at the Ruđer Bošković Institute, researching machine learning methods for forecasting financial crises. During Dražen's scientific work Python, Jupyter Notebook (back then still IPython Notebook), Matplotlib, and Pandas were his best friends over many nights of interactive manipulation of all sorts of time series and spatial data. Dražen also did a Master's degree in computer science at the University of Zagreb.

  • Cyrille Rossant, Ph.D., is a neuroscience researcher and software engineer at University College London. He is a graduate of École Normale Supérieure, Paris, where he studied mathematics and computer science. He has also worked at Princeton University and Collège de France. While working on data science and software engineering projects, he gained experience in numerical computing, parallel computing, and high-performance data visualization.

He is the author of Learning IPython for Interactive Computing and Data Visualization, Second Edition, Packt Publishing.

Who this course is for:

  • This course is aimed at data analyst, developers, students and professionals keen to master the use of Jupyter to perform a variety of data science tasks. Some programming experience with Python and a basic understanding of Jupyter is required.