Learn CUDA with Docker!

Name: Learn CUDA with Docker!
Rating: 3.8 (30 reviews)

Learrn to Code with CUDA with GPGPU-Simulators & Docker, Kickstart Your Computing and Data Science Career!

Created byScientific Programmer™ Team, Scientific Programming School

Last updated 4/2021

English

What you'll learn

How to code with CUDA, but without a GPU!
Basic knowladge about CUDA programming
Ability to desing and implement CUDA parallel algorithms

Course content

10 sections • 44 lectures • 2h 58m total length

Welcome!0:54
Explore cuda with docker in a master class on cuda programming, featuring c++ and advanced features, with gpu experts guiding Linux, dev ops, high-performance computing, and data scientists.
Why Get this Course?1:11
Instructor0:29
Practice C++ with Interactive Shell0:36
Introduction2:35
Create a virtual machine (droplet)5:19
Learn to create a DigitalOcean droplet, install Docker (via marketplace or fresh), and prepare a cuda development environment on a base Ubuntu image with gpu simulation.
Access droplet and setup with docker6:17
Access a digital ocean droplet via public IP, log in as root, secure the temporary password, verify Ubuntu 18.04, and install Docker to run CUDA on Docker.
What is the GPGPU-SIM (Simulator)?3:25
Discover how to use the GPGPU-SIM simulator with Docker to run CUDA code on a laptop, including downloading a prebuilt Docker image, prerequisites, and running in a VM.
Setup the GPU Simulator with Docker8:20
Access the virtual machine and install the gpusim docker image to set up cuda tooling inside a container, then run a container and open a bash shell to use nvcc.
Run first CUDA code on Docker container without any GPUs!6:21
Learn how to compile and run a CUDA program inside a docker container without GPUs, using a sample vector addition and a prepared image.
Useful docker commands0:22

What the hell is CUDA?4:30
Relation between GPUS and CUDA2:38
Explore how GPUs execute CUDA kernels by a hierarchy of streaming multiprocessors containing cores, registers, local memory, shared memory, and texture memory, with threads organized into blocks running concurrently.
How does CUDA work?4:22
Learn CUDA's heterogeneous computing by coordinating host memory and device memory, transferring data to the device, executing host and device code, and returning results to the host.

Introduction1:37
Foundation of CUDA Threads, blocks and grid4:47
Explore how CUDA threads map to blocks and grids, and how streaming multiprocessors execute blocks of threads, sharing registers and L1 cache while handling kernels.
How to index a CUDA thread in 1D2:02
Compute the index of a CUDA thread in 1d using blockIdx.x * blockDim.x + threadIdx.x. Set blockDim.x to 256 and assign a specific thread (block 2, thread 3) to work.
Index a CUDA thread in 2D1:35
Learn how CUDA maps a grid of blocks to two-dimensional indices and breaks each block into threads, enabling two-dimensional addressing across streaming multiprocessors.
Thread Syncs0:49
CUDA Warps1:06

CUDA Hello World!5:22
Learn how to write a CUDA hello world program, starting from a simple C hello world, adding a global kernel called my_kernel, and running with one thread in one block.
CUDA simple addition (theory)8:11
CUDA simple addition (demonstration)6:25
Learn how cuda performs a simple addition by implementing a gpu kernel to add two integers, with host-to-device memory copies, kernel launch, and device-to-host copy to print 10.
CUDA addition (Multiple Blocks)6:41
CUDA addition (Multiple Threads)3:47
CUDA vector addition (demonstration)5:38

Introduction3:59
CUDA matrix multiplication (Theory)4:30
Transform the two loops into a CUDA kernel using threadIdx.x and threadIdx.y, with blocks handling chunks in shared memory to compute the output matrix.
CUDA code (matrix multiplication)8:05
Showcases cuda code for matrix multiplication by loading matrices A and B, transferring to the device, and executing a kernel with 2D thread blocks and shared memory.
Execute the matrix multiplication code3:10
Execute a matrix multiplication using CUDA in Visual Studio, detailing building, linking, and running on a GPU device, with notes on compute architecture and driver toolkit setup.

CUDA High Level Concetps9:25
Explore cuda high level concepts focusing on gpu versus cpu throughput, data parallelism, kernels, threads, blocks, grids, and the host-device model of heterogeneous parallel programming.
Programming Model8:00
The host controls the program flow, launching kernels with grid and block dimensions and using explicit barriers for synchronization. Allocate device memory and copy data between host and device.
Parallel for-loop7:10
Learn kernel syntax with the global declaration, manage host–device memory, and map threads using threadIdx.x for parallel execution, including converting a for loop to parallel threads and vector addition.
Indexing6:00
Master computing a unique CUDA thread index by combining blockIdx.x, gridDim.x, blockDim.x, and threadIdx.x, then apply this across x dimensions using dot notation for x, y, z.
Memory15:05
Understand the CUDA memory model and thread hierarchy, detailing local, shared, and global memory, plus registers and constant memory, and how data placement impacts performance.
Synchronization7:35
Learn how CUDA threads race data and how barriers synchronize threads within a block. See explicit barriers with sync threads and understand host-device synchronization and implicit barriers between kernel launches.

Requirements

Basic C or C++ programming knowledge

Description

WELCOME!

We present you the long waited approach to Learn CUDA WITHOUT NVIDIA GPUS! Finally, you can learn CUDA just on your laptop, tablet or even on your mobile, and that's it! CUDA provides a general-purpose programming model which gives you access to the tremendous computational power of modern GPUs, as well as powerful libraries for machine learning, image processing, linear algebra, and parallel algorithms.

WHAT DO YOU LEARN?

We will demonstrate how you can learn CUDA with the simple use of Docker and OS-level virtualization to deliver software in packages called containers and GPGPU-Sim, a cycle-level simulator modeling contemporary graphics processing units (GPUs) running GPU computing workloads written in CUDA or OpenCL. This course aims to introduce you with the NVIDIA's CUDA parallel architecture and programming model in an easy-to-understand way. We plan to update the lessons and add more lessons and exercises every month!

Virtualization basics
Docker Essentials
GPU Basics
CUDA Installation
CUDA Toolkit
CUDA Threads and Blocks in various combinations
CUDA Coding Examples

Based on your earlier feedback, we are introducing a Zoom live class lecture series on this course through which we will explain different aspects of the Parallel and distributed computing and the High Performance Computing (HPC) systems software stack: Slurm, PBS Pro, OpenMP, MPI and CUDA! Live classes will be delivered through the Scientific Programming School, which is an interactive and advanced e-learning platform for learning scientific coding. Students purchasing this course will receive free access to the interactive version (with Scientific code playgrounds) of this course from the Scientific Programming School (SCIENTIFIC PROGRAMMING IO) . Instructions to join are given in the bonus content section.

DISCLAIMER

Some of the images used in this course are copyrighted to NVIDIA.

Who this course is for:

Any one who wants to learn CUDA programming, but does NOT have access to expensive GPUs

Learn CUDA with Docker!

What you'll learn

Explore related topics

Course content

Introduction11 lectures • 36min

CUDA foundation3 lectures • 12min

CUDA threads, blocks and grid6 lectures • 12min

CUDA memory models2 lectures • 3min

CUDA vector addition6 lectures • 36min

CUDA matrix multiplication4 lectures • 20min

CUDA streams2 lectures • 1min

NVIDIA Docker Container Toolkit1 lecture • 4min

CUDA for Dummies6 lectures • 53min

Additonal Contents3 lectures • 2min

Requirements

Description

Who this course is for: