High Performance Scientific Computing with C
3.9 (38 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
206 students enrolled

High Performance Scientific Computing with C

Use algorithm design, hardware features, and parallelism to build fast, accurate, and efficient scientific code
3.9 (38 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
206 students enrolled
Created by Packt Publishing
Last updated 8/2018
English [Auto-generated]
Current price: $86.99 Original price: $124.99 Discount: 30% off
5 hours left at this price!
30-Day Money-Back Guarantee
This course includes
  • 2.5 hours on-demand video
  • 1 downloadable resource
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
Training 5 or more people?

Get your team access to 4,000+ top Udemy courses anytime, anywhere.

Try Udemy for Business
What you'll learn
  • Use the C programming language to write numerical code
  • Get to know Core algorithms used in scientific computing
  • See how CPU design limits program performance
  • Control the speed and accuracy of your programs
  • Know the limitations of inaccuracy and performance and tradeoffs between them
  • Use modern parallel architectures, distributed systems, and GPGPU accelerators to speed up your programs
  • Optimize and extend your code to use multiple cores with OpenMP and across multiple networked machines using MPI
Course content
Expand all 17 lectures 02:16:12
+ Core Algorithms of Scientific Computing
6 lectures 48:39

This video will give you an overview about the course.

Preview 04:21

Why is the history of computation so tied with mathematics? How are computers used today to solve mathematical problems?

   •  Understand the need for computers to solve mathematical problems

   •  Understand the problems for which computers are used

   •  Understand what errors and problems computers might face while solving problems

Introduction – Why Use Computers for Math?

How can we "fill in" the data points between discrete data? How can we extend beyond our data points?

   •  Learn linear interpolation

   •  Learn polynomial interpolation

   •  See the dangers of extrapolation

Interpolation and Extrapolation

How can we calculate integrals with a computer? How can we solve differential equations?

   •  Calculate integrals with the trapezoid and Simpson’s rule

   •  See how the error terms scale with different algorithms

   •  Solve differential equations with the Verlet algorithm

Numerical Integration

How can we invert a matrix? How fast can we do it?

   •  Learn why matrix inversion is useful

   •  See how to invert a matrix numerically

   •  Learn how expensive matrix inversion is

Linear Equations and Matrix Methods

How can we use random numbers to solve problems?

   •  Learn why randomness is useful

   •  See how we can solve integrals with random numbers

   •  See how the accuracy of Monte Carlo methods scales

Monte Carlo Methods
+ Optimizing Scientific Code for Performance and Accuracy
4 lectures 29:46

How are real numbers stored and manipulated? How does this affect our program’s performance?

   •  Learn how the IEEE-754 standard defines floating-point storage

   •  Learn the range of floating-point numbers

   •  See how we can go beyond this with subnormalnumbers, underflows, and overflows

Preview 07:09

How does the design of our programs affect their speed and accuracy?

   •  Learn about floating-point round-off error

   •  Learn about computational complexity

   •  Examine the divide-and-conquer design approach

Algorithm Complexity and Performance

How can we deal with spectral data? How can we apply divide-and-conquer approaches to numerical algorithms?

   •  See how simple the DFT is to implement

   •  Learn how to apply the divide-and-conquer approach for the FFT

   •  See the huge speed-up that the FFT provides over the DFT

Discrete versus Fast Fourier Transform

How can we use the compiler to automatically speed up our programs?

   •  Learn about the basic -O optimizations

   •  See how -Ofast-math can squeak out even better optimizations

   •  See why -Ofast-math isn’t enabled by default

Compiler Optimizations
+ Optimizing for the CPU
3 lectures 25:45

What features do modern CPUs have that make them faster?

   •  See what parts make up a modern CPU

   •  Learn about caching and pipelining

   •  Learn how long different operations take

How the CPU Works

How can we design our programs to take the most advantage of modern CPU design?

   •  Learn about caching

   •  Learn about branch prediction and speculative execution

   •  Stay out of the way of the CPU and compiler!

Pipelining and Hardware-Oriented Design

How can we use automatic vectorization to speed up our code? What vectorization options currently exist?

   •  Learn about the history of vector instructions

   •  Learn how to compile with AVX/AVX2

   •  Learn about the latest AVX-512 instruction set

Vectorizing with AVX
+ Accelerating Code with Parallel and Distributed Computing
4 lectures 32:02

How can we extend our programs to use multiple cores? Why would we want to? What limitations might exist?

   •  Learn the different kinds of parallel architectures

   •  Learn about strong scaling and Amdahl’s law

   •  Learn about weak scaling and Gustafson’s law

Parallel Architectures, Amdahl’s Law, and Gustafson’s Law

How can we easily use our multi-core systems with more than one thread?

   •  Learn how to parallelize loops with OpenMP

   •  Learn how to change the number of threads

   •  Learn how to use reductions to finalize calculations

Shared Memory Parallelism with OpenMP

How can we extend our codes across multiple machines? What do we need to use MPI?

   •  Learn how to add MPI communication to our code

   •  Learn how to launch mpiprograms with mpirun

   •  See how to use mpirun to distribute our code across multiple machines

Distributed Memory Parallelism with MPI

How can we accelerate our codes using modern GPUs? What is CUDA?

   •  Learn how to add CUDA kernels to our code

   •  See how to use nvcc to compile CUDA code

   •  Learn about tuning CUDA code for performance

  • This course is for scientists, engineers, or programmers who already know at least one programming language or have some basic knowledge of C.

In this course, you’ll learn to develop scientific and numerical programs that solve problems. It’s ideal for scientists, engineers, and programmers who need to model mathematical or physical systems. You’ll get a core toolkit of algorithms that can be used in a wide variety of applications, using the low-level capabilities of the C programming language.

The close-to-the-metal approach means you’ll learn to optimize our programs to get the absolute best performance that our hardware can provide. You’ll see how the design of algorithms affects their performance and accuracy, the tools that can be used to optimize your code and develop your intuition about numerical problems. Finally, you’ll examine the growing array of parallel solutions that enable you to take advantage of multi-core CPUs, distributed compute clusters and GPU accelerators.

By the end of this course, you’ll know how to write fast, accurate code that can run on many different platforms and solve many different scientific problems. 

About the Author

Benjamin Keller is a postdoctoral researcher in the MUSTANG group at Universität Heidelberg's Astronomisches Rechen-Institut. He obtained his Ph.D. at McMaster University and got his BSc in Physics with a minor in Computer Science from the University of Calgary in 2011. His current research involves numerical modeling of the interstellar medium over cosmological timescales.

He comes with an experience in writing scientific code in C, FORTRAN, and Python. He also works as a Python consultant for data science startups, building visualization and data science pipelines.

At McMaster, he worked with Dr. James Wadsley in the Physics & Astronomy department. His current research involves numerical simulations of galaxy formation on supercomputers with 10,000+ cores. He has also been a key contributor to multiple scientific computing projects, from simulation codes to visualization libraries.

Who this course is for:
  • It’s ideal for those want to learn to use numerical solutions for complex mathematical problems. It’s for those who need to develop code to simulate physical systems, deal with continuous data, or squeeze extra performance out of their existing hardware.