
Watch the introduction - it will help you understand the course structure and make most out of it!
What is Jupyter and where did it come from?
How to run Docker on your machine?
How to start you first instance of a Dockerized Jupyter Notebook server?
How to map ports in order to access your Jupyter Notebook from the browser?
How to run your Docker container in the background to make it more failure proof?
What happens if your container crushes?
How to create persistent volumes?
Can you find some interesting relationships in the data?
A simple approach to dig into the data set
What is Superset?
What Docker images are available and how to start a Dockerized Superset?
How to mark data inside of Superset to properly visualize it?
What visualizations are possible within Superset and how can they be combined into a dashboard?
Can you "rework" the charts from the previous lecture's course project and make them fancy-schmancy interactive?
Build an interactive visualization and filter for a dashboard using Superset, by uploading the data file, configuring the table, and creating a filter chart that refreshes the view.
What is Postgres?
How to properly start a Dockerized Postgres instance?
What is not working in the current architecture?
What is docker-compose and how does it different from Docker?
How to actually launch container via docker-compose?
How to utilize the new docker-compose architecture?
How to create custom Postgres user and databases upfront?
Can you apply your new knowledge about docker-compose to improve your course project?
Configure a docker compose workflow with Jupyter notebook, Superset, and Postgres, mount scripts and volumes, then connect a data frame to a database and visualize results.
What is Minio?
How to start a Minio container?
How to upload a file via the web browser?
How to upload the file via Python?
Can you attach a new Minio component to your current architecture in order to store machine learning models?
Load data from a postcode database, train an arbitrary model, save it as a trend perception file, and deploy the classifier as a prediction service with an object store.
learn to build a RESTful api with API Star to let others interact with your model and return predictions, enabling a prediction-as-a-service api with automated api documentation generation.
Explore API Star, a web API framework, by building an http endpoint with a handler, routes, and an app. Run locally on port 5000 and test with a name parameter.
Build a custom API Star image with docker on a slim python 3 base. Expose port 8000 and map host 5000 to access the API Star endpoint.
Build and customize an AP star docker image to power a prediction service, installing scikit-learn, pandas, and numpy, then run docker build -t AP star:latest to deploy.
Extend your custom wine prediction project by exposing two rest endpoints, predict and retrain, to serve input features and return probabilities, with a secret key and the provided docker file.
Create a secured retrain endpoint that validates a token, loads data, trains and persists the classifier with a timestamp, and expose a predict endpoint returning probabilities as JSON.
Explore how scheduling tasks at fixed intervals enables automated execution, from cron jobs to fetching external data and retraining models to keep predictions up to date.
Explore Apache Airflow, a scheduling framework for orchestrating tasks in workflows, and learn core concepts like tasks, directed graphs, data fetch, feature derivation, storage, and model training.
Launch apache airflow with a community image and run a hello world task via docker compose. Explore the web UI at localhost:1888 and review basic task instances and deck resources.
Explore creating a dag in Airflow by configuring default arguments, setting a 30-minute schedule, and building a PythonOperator-based workflow that logs time, sleeps, and prints hello world.
Leverage Airflow to implement a scheduled model retrain. Fetch the last model, retrain on a random subset of database, and save the updated model with a timestamp using boilerplate infrastructure.
Learn how to build a course project solution by retraining a model, wrapping the retraining function in a Python operator, and scheduling with Airflow.
Master the course content beyond jupyter notebooks, using jupyter for computation, dashboards, storage, and deployment, and explore the opinionated stack with cache, graph database, nginx, Prometheus, and solr.
Interactive notebooks like Jupyter have become more and more popular in the recent past and build the core of many data scientist’s workplace. Being accessed via web browser they allow scientists to easily structure their work by combining code and documentation. Yet notebooks often lead to isolated and disposable analysis artefacts. Keeping the computation inside those notebooks does not allow for convenient concurrent model training, model exposure or scheduled model retraining.
Those issues can be addressed by taking advantage of recent developments in the discipline of software engineering. Over the past years containerization became the technology of choice for crafting and deploying applications. Building a data science platform that allows for easy access (via notebooks), flexibility and reproducibility (via containerization) combines the best of both worlds and addresses Data Scientist’s hidden needs.