Beyond Jupyter Notebooks

Name: Beyond Jupyter Notebooks
Rating: 4.4 (184 reviews)

Build your own Data science platform with Docker & Python

Created byJoshua Görner

Last updated 4/2019

English

What you'll learn

Docker
Data Science
Jupyter
Python
Data Analysis
Data Visualization
Open Source

Course content

8 sections • 45 lectures • 1h 27m total length

Watch Me First :-)2:13
Watch the introduction - it will help you understand the course structure and make most out of it!

Introduction2:00
What is Jupyter and where did it come from?
How to install Docker?0:47
How to run Docker on your machine?
Starting Jupyter1:59
How to start you first instance of a Dockerized Jupyter Notebook server?
Mapping Ports1:59
How to map ports in order to access your Jupyter Notebook from the browser?
Running in detached Mode2:53
How to run your Docker container in the background to make it more failure proof?
Facing a Persistence Problem2:21
What happens if your container crushes?
Solving the Persistence Problem2:33
How to create persistent volumes?
Course Project - Task2:06
Can you find some interesting relationships in the data?
Course Project - Solution2:43
A simple approach to dig into the data set

Introduction1:09
What is Superset?
Starting Superset1:38
What Docker images are available and how to start a Dockerized Superset?
Prepare Data3:48
How to mark data inside of Superset to properly visualize it?
Charts and Dashboards3:22
What visualizations are possible within Superset and how can they be combined into a dashboard?
Course Project - Task1:03
Can you "rework" the charts from the previous lecture's course project and make them fancy-schmancy interactive?
Course Project - Solution1:24
Build an interactive visualization and filter for a dashboard using Superset, by uploading the data file, configuring the table, and creating a filter chart that refreshes the view.

Introduction1:08
What is Postgres?
Starting Postgres2:23
How to properly start a Dockerized Postgres instance?
Facing an Access Problem0:42
What is not working in the current architecture?
Docker-Compose (I/II)1:07
What is docker-compose and how does it different from Docker?
Docker-Compose (II/II)2:24
How to actually launch container via docker-compose?
Solving the Access Problem3:26
How to utilize the new docker-compose architecture?
Create a Custom User2:51
How to create custom Postgres user and databases upfront?
Course Project - Task1:04
Can you apply your new knowledge about docker-compose to improve your course project?
Course Project - Solution2:03
Configure a docker compose workflow with Jupyter notebook, Superset, and Postgres, mount scripts and volumes, then connect a data frame to a database and visualize results.

Introduction1:03
What is Minio?
Starting Minio2:34
How to start a Minio container?
GUI Interaction2:09
How to upload a file via the web browser?
Programmatic Interaction2:22
How to upload the file via Python?
Course Project - Task1:13
Can you attach a new Minio component to your current architecture in order to store machine learning models?
Course Project - Solution2:11
Load data from a postcode database, train an arbitrary model, save it as a trend perception file, and deploy the classifier as a prediction service with an object store.

Introduction1:17
learn to build a RESTful api with API Star to let others interact with your model and return predictions, enabling a prediction-as-a-service api with automated api documentation generation.
Starting API-Star2:27
Explore API Star, a web API framework, by building an http endpoint with a handler, routes, and an app. Run locally on port 5000 and test with a name parameter.
API-Star and Docker3:12
Build a custom API Star image with docker on a slim python 3 base. Expose port 8000 and map host 5000 to access the API Star endpoint.
Docker Enhancements1:16
Build and customize an AP star docker image to power a prediction service, installing scikit-learn, pandas, and numpy, then run docker build -t AP star:latest to deploy.
Course Project - Task1:19
Extend your custom wine prediction project by exposing two rest endpoints, predict and retrain, to serve input features and return probabilities, with a secret key and the provided docker file.
Course Project - Solution3:17
Create a secured retrain endpoint that validates a token, loads data, trains and persists the classifier with a timestamp, and expose a predict endpoint returning probabilities as JSON.

Introduction1:29
Explore how scheduling tasks at fixed intervals enables automated execution, from cron jobs to fetching external data and retraining models to keep predictions up to date.
Basic Concepts1:34
Explore Apache Airflow, a scheduling framework for orchestrating tasks in workflows, and learn core concepts like tasks, directed graphs, data fetch, feature derivation, storage, and model training.
Starting Airflow2:39
Launch apache airflow with a community image and run a hello world task via docker compose. Explore the web UI at localhost:1888 and review basic task instances and deck resources.
DAG Creation2:26
Explore creating a dag in Airflow by configuring default arguments, setting a 30-minute schedule, and building a PythonOperator-based workflow that logs time, sleeps, and prints hello world.
Course Project - Task0:47
Leverage Airflow to implement a scheduled model retrain. Fetch the last model, retrain on a random subset of database, and save the updated model with a timestamp using boilerplate infrastructure.
Course Project - Solution1:20
Learn how to build a course project solution by retraining a model, wrapping the retraining function in a Python operator, and scheduling with Airflow.

Requirements

Minimal Python Knowledge
Running Docker Installation
Fun exploring new topics

Description

Interactive notebooks like Jupyter have become more and more popular in the recent past and build the core of many data scientist’s workplace. Being accessed via web browser they allow scientists to easily structure their work by combining code and documentation. Yet notebooks often lead to isolated and disposable analysis artefacts. Keeping the computation inside those notebooks does not allow for convenient concurrent model training, model exposure or scheduled model retraining.

Those issues can be addressed by taking advantage of recent developments in the discipline of software engineering. Over the past years containerization became the technology of choice for crafting and deploying applications. Building a data science platform that allows for easy access (via notebooks), flexibility and reproducibility (via containerization) combines the best of both worlds and addresses Data Scientist’s hidden needs.

Who this course is for:

Any level of data scientists that want to accelerate their capabilities
Open Source Lover ❤️
Pythonistas interested in Docker

Beyond Jupyter Notebooks

What you'll learn

Explore related topics

Course content

Introduction1 lecture • 2min

Analyze your Data (Jupyter/Docker)9 lectures • 19min

Visualize your Data (Superset)6 lectures • 12min

Store your structured Data (Postgres)9 lectures • 17min

Store your unstructured Data (Minio)6 lectures • 12min

Expose your Model (API-Star)6 lectures • 13min

Automate your Analysis (Airflow)6 lectures • 10min

Wrap Up2 lectures • 2min

Requirements

Description

Who this course is for: