
Master GPU-accelerated AI workflows with Nvidia rapids, cudf, cupy, and cuml, comparing performance to pandas and sklearn, and building end-to-end projects in Google Colab.
Install rapids on Google Colab via a single command or a repository script, selecting GPU, CUDA, and Python versions. Save a copy of the notebook to Google Drive when customizing.
Apply user defined functions to data frames with CUDA, using df.apply, apply rows, and apply chunks for GPU acceleration and missing-value handling. Create UDFs like add for row operations.
Explore the performance comparison between rapids on GPU and pandas on CPU, testing value counts, concatenation, group by, merge, and string operations on datasets with 10 million rows.
Apply hyperparameter tuning with grid search cv in sklearn to find the best alpha for the ridge algorithm, improving the r2 score from 0.37 to 0.49 on a small dataset.
Distribute a 100,000 x 100 matrix across GPUs or CPUs using Dask array, with CuPy random state and chunking, then perform SVD and persist results to GPU memory.
Learn to integrate Dask and cuDF to partition data across GPUs and CPUs, compute across partitions, and export results to CSV.
This course is independently developed and is not affiliated with, endorsed, or sponsored by NVIDIA Corporation. RAPIDS is an open-source project originally developed by NVIDIA.
Data science and machine learning represent the largest computational sectors in the world, where modest improvements in the accuracy of analytical models can translate into billions of impact on the bottom line. Data scientists are constantly striving to train, evaluate, iterate, and optimize models to achieve highly accurate results and exceptional performance. With NVIDIA's powerful RAPIDS platform, what used to take days can now be accomplished in a matter of minutes, making the construction and deployment of high-value models easier and more agile. In data science, additional computational power means faster and more effective insights. RAPIDS harnesses the power of NVIDIA CUDA to accelerate the entire data science model training workflow, running it on graphics processing units (GPUs).
In this course, you will learn everything you need to take your machine learning applications to the next level! Check out some of the topics that will be covered below:
Utilizing the cuDF, cuPy, and cuML libraries instead of Pandas, Numpy, and scikit-learn; ensuring that data is processed and machine learning algorithms are executed with high performance on the GPU.
Comparing the performance of classic Python libraries with RAPIDS. In some experiments conducted during the classes, we achieved acceleration rates exceeding 900x. This indicates that with certain databases and algorithms, RAPIDS can be 900 times faster!
Creating a complete, step-by-step machine learning project using RAPIDS, from data loading to predictions.
Using DASK for task parallelism on multiple GPUs or CPUs; integrated with RAPIDS for superior performance.
Throughout the course, we will use the Python programming language and the online Google Colab. This way, you don't need to have a local GPU to follow the classes, as we will use the free hardware provided by Google.