
Explore cuda with docker in a master class on cuda programming, featuring c++ and advanced features, with gpu experts guiding Linux, dev ops, high-performance computing, and data scientists.
Learn to create a DigitalOcean droplet, install Docker (via marketplace or fresh), and prepare a cuda development environment on a base Ubuntu image with gpu simulation.
Access a digital ocean droplet via public IP, log in as root, secure the temporary password, verify Ubuntu 18.04, and install Docker to run CUDA on Docker.
Discover how to use the GPGPU-SIM simulator with Docker to run CUDA code on a laptop, including downloading a prebuilt Docker image, prerequisites, and running in a VM.
Access the virtual machine and install the gpusim docker image to set up cuda tooling inside a container, then run a container and open a bash shell to use nvcc.
Learn how to compile and run a CUDA program inside a docker container without GPUs, using a sample vector addition and a prepared image.
Explore how GPUs execute CUDA kernels by a hierarchy of streaming multiprocessors containing cores, registers, local memory, shared memory, and texture memory, with threads organized into blocks running concurrently.
Learn CUDA's heterogeneous computing by coordinating host memory and device memory, transferring data to the device, executing host and device code, and returning results to the host.
Explore how CUDA threads map to blocks and grids, and how streaming multiprocessors execute blocks of threads, sharing registers and L1 cache while handling kernels.
Compute the index of a CUDA thread in 1d using blockIdx.x * blockDim.x + threadIdx.x. Set blockDim.x to 256 and assign a specific thread (block 2, thread 3) to work.
Learn how CUDA maps a grid of blocks to two-dimensional indices and breaks each block into threads, enabling two-dimensional addressing across streaming multiprocessors.
Explore the CUDA memory hierarchy from device memory to caches and registers, and learn how unified address space maps local and shared memory onto global memory.
Learn how to write a CUDA hello world program, starting from a simple C hello world, adding a global kernel called my_kernel, and running with one thread in one block.
Learn how cuda performs a simple addition by implementing a gpu kernel to add two integers, with host-to-device memory copies, kernel launch, and device-to-host copy to print 10.
Transform the two loops into a CUDA kernel using threadIdx.x and threadIdx.y, with blocks handling chunks in shared memory to compute the output matrix.
Showcases cuda code for matrix multiplication by loading matrices A and B, transferring to the device, and executing a kernel with 2D thread blocks and shared memory.
Execute a matrix multiplication using CUDA in Visual Studio, detailing building, linking, and running on a GPU device, with notes on compute architecture and driver toolkit setup.
Install the NVIDIA container toolkit on a multi-GPU server to allocate GPUs to Docker containers using device IDs and enable GPU access for each user.
Explore cuda high level concepts focusing on gpu versus cpu throughput, data parallelism, kernels, threads, blocks, grids, and the host-device model of heterogeneous parallel programming.
The host controls the program flow, launching kernels with grid and block dimensions and using explicit barriers for synchronization. Allocate device memory and copy data between host and device.
Learn kernel syntax with the global declaration, manage host–device memory, and map threads using threadIdx.x for parallel execution, including converting a for loop to parallel threads and vector addition.
Master computing a unique CUDA thread index by combining blockIdx.x, gridDim.x, blockDim.x, and threadIdx.x, then apply this across x dimensions using dot notation for x, y, z.
Understand the CUDA memory model and thread hierarchy, detailing local, shared, and global memory, plus registers and constant memory, and how data placement impacts performance.
Learn how CUDA threads race data and how barriers synchronize threads within a block. See explicit barriers with sync threads and understand host-device synchronization and implicit barriers between kernel launches.
Explore how to access the live class and code playgrounds in your browser with a free interactive version, demo included, and no registrations required.
WELCOME!
We present you the long waited approach to Learn CUDA WITHOUT NVIDIA GPUS! Finally, you can learn CUDA just on your laptop, tablet or even on your mobile, and that's it! CUDA provides a general-purpose programming model which gives you access to the tremendous computational power of modern GPUs, as well as powerful libraries for machine learning, image processing, linear algebra, and parallel algorithms.
WHAT DO YOU LEARN?
We will demonstrate how you can learn CUDA with the simple use of Docker and OS-level virtualization to deliver software in packages called containers and GPGPU-Sim, a cycle-level simulator modeling contemporary graphics processing units (GPUs) running GPU computing workloads written in CUDA or OpenCL. This course aims to introduce you with the NVIDIA's CUDA parallel architecture and programming model in an easy-to-understand way. We plan to update the lessons and add more lessons and exercises every month!
Virtualization basics
Docker Essentials
GPU Basics
CUDA Installation
CUDA Toolkit
CUDA Threads and Blocks in various combinations
CUDA Coding Examples
Based on your earlier feedback, we are introducing a Zoom live class lecture series on this course through which we will explain different aspects of the Parallel and distributed computing and the High Performance Computing (HPC) systems software stack: Slurm, PBS Pro, OpenMP, MPI and CUDA! Live classes will be delivered through the Scientific Programming School, which is an interactive and advanced e-learning platform for learning scientific coding. Students purchasing this course will receive free access to the interactive version (with Scientific code playgrounds) of this course from the Scientific Programming School (SCIENTIFIC PROGRAMMING IO) . Instructions to join are given in the bonus content section.
DISCLAIMER
Some of the images used in this course are copyrighted to NVIDIA.