Audio Classification using Convolutional Neural Net

Name: Audio Classification using Convolutional Neural Net
Rating: 4.1 (6 reviews)

Audio Classification using Convolutional Neural Net with Raspberry 5 AI Model Deployment

Created byAhmed Qusay

Last updated 8/2025

English

What you'll learn

Define the real Audio Environments to record clips for Machine Learning Model Prediction.
Compose Negative and Positive Audio Clips for Machine Learning use.
Inject “Audio Keyword” to trigger RASPI 5 processor action in Positive Audio Clips.
Slice an Audio file into several Audio Clips for Neural Net feeding.
Apply the 5 Stages of Raw Audio Preparation for Neural Net use (load, time domain, frequency domain, spectrogram, and resize).
Use Librosa, Spectrogram, and Decibel of raw Audio for Neural Net use.
Apply Labeling Audio Clips as Positive Audio and Negative Audio for the use of Neural Net Training.
Slicing, Labeling, and Batching Audio Clips for Neural Net use.
Use Google Colab to create a Python Program with Convolutional Neural networks for Audio Classification and Prediction to be saved as an H5 AI Model.
Assemble the Raspberry Pi 5 and other Hardware Devices for Audio Prediction use.
Install Raspberry Pi 5 Software Requirements, including the Operating System, VNC viewer, Librosa, Tensorflow, and more.
Use FileZilla to transfer the H5 AI Model to be saved in Raspberry Pi 5.
Deploy and Run the H5 AI Model inside the Raspberry Pi 5 and make the Audio Prediction.
Use the Raspberry Pi 5 Audio Prediction to control the movement of the Servo Motor.

Course content

8 sections • 39 lectures • 6h 7m total length

Introduction6:40
Classify audio with a convolutional neural network, including data preparation, labeling, and training. Deploy the trained h5 model on Raspberry Pi to control hardware such as a servo motor.

Define Audio Environments and Audio Keyword4:39
Recording First Environment (Silent)1:59
Recording Second Environment (Family Talk)1:57
Recording Third Environment (Family Talk + Glass Down)1:18
Audio Convert to WAV6:22
Demonstrates converting mobile audio recordings from m4a to wav for machine learning, using cloud convert and basic file organization, with optional Python automation.
Compose Positive Audio6:09
Compose Positive Audio - Family Talk Environment3:57
Slicing Single Audio File into 3 Seconds Multiple Clips in Python11:39
Slicing Audio, Balancing Positive and Negative8:34
Train-Test Audio Dataset Split using Copy-Paste9:37
Create a train and test folder split for audio clips, allocating about 90% to training and 10% to testing, while balancing across environments to evaluate the model.
Train-Test Audio Dataset Split using Python - Part 112:33
Train-Test Audio Dataset Split using Python - Part 210:49
Implement a Python driven train-test split by copying or moving negative and positive clips into train and test folders using a for loop and shuttle commands, with counters tracking totals.

Sorting Audio Clips10:30
Load and Play Audio Files in Python7:51
Load and play audio files in Python by sorting train sorted files into negative (label zero) and positive (label one), then load with librosa and play using IPython display.
Spectrogram11:05
Explore four stages to convert audio clips into a spectrogram. Load audio with librosa, transform to frequency domain, apply amplitude to decibel, and expand to 3d tensor for cnn.
Five Stages Audio Processing11:25
Process train sorted WAV clips through five stages: load audio with Librosa, apply Fourier transform, convert to decibel spectrogram, resize to a 3D tensor, and assemble into audio_clips for TensorFlow.

CNN Libraries, Layers, Compile and Summary10:54
Learn to assemble a sequential CNN with convolutional and max pooling layers, flatten and dense layers, then compile with Adam and binary cross-entropy, and review the model summary.
CNN train-test Audio Files Split and Model Fit10:20
CNN Make a Prediction on Audio Files9:15
CNN Save as H5 and Test the Model7:09
Save the trained cnn as an h5 file, set the target directory for Raspberry Pi deployment, load the saved model, and run predictions to validate performance on the dataset.

Raspberry Pi 5 and other Hardware Devices5:52
Assembly of Raspberry Pi 5 (Thermal Pads, Active Cooler, and Power Supply)11:51
Raspberry Pi 5 Operating System OS Installation7:11
Raspberry Pi 5 Connection and VNC Installation9:33
Connect to the Raspberry Pi 5 via SSH, update and upgrade packages, enable the VNC server with raspi-config, and access the pi remotely with the VNC viewer, while monitoring temperature.

Raspberry Pi 5 Virtual Environment and Install of TensorFlow9:41
Raspberry Pi 5 Librosa Install and AI Model Copy with FileZilla11:18
Raspberry Pi 5 Audio Recording7:25
Record three-second audio clips on a Raspberry Pi 5 using a USB microphone, save them as WAV files in the project folder, and feed each clip to a convolutional neural net to determine whether it contains the word Ahmed.
Raspberry Pi 5 Audio Prediction9:41
Record a three-second audio clip with a USB mic on Raspberry Pi 5, preprocess with librosa, FFT, and decibel spectrogram, then predict with h5 model and drive hardware via GPIO.
Raspberry Pi 5 and Servo Motor Connection7:55
Run and Test AI Audio Model in Raspberry Pi 57:40
Test a convolutional neural network-based audio classification model on Raspberry Pi 5, using a mic and servo to act when predictions exceed 0.9, even in noise.

Groans Clips Dataset Connection Python Colab14:57
Audio Processing for Machine Learning15:42
Learn to prepare an audio dataset for convolutional neural nets by loading audio with Librosa, applying short-time Fourier transform, converting to decibel spectrograms, and expanding dimensions for neural network input.
Audio Labeling (Positive and Negative)18:56
CNN24:26
RNN16:20
the lecture demonstrates adapting a cnn-based audio classification workflow to an rnn with lstm using keras, detailing architecture changes, activation choices, and binary classification performance.

Requirements

Basic knowledge of Python and basic knowledge of AI.

Description

This course is designed to provide a real understanding of handling audio files in machine learning. This course will give you a complete track record of processing audio files from A to Z using Python. This course will explain how to use Convolutional Neural Networks to generate an H5 AI model for audio classification purposes. This course gives you a complete understanding of Raspberry Pi 5 assembly, programming, AI Model deployment, and prediction of audio files. We will learn how to identify audio environments for machine-learning purposes. We will learn how to record audio files and slice them into clips of positive and negative types. How to process the raw audio clips and inject the “keyword” to be detected by the neural network. Apply clip labeling, clip slicing, and clip batching for the preparation of feeding audio clips to the Neural Net. Apply the required stages (load, time domain, frequency domain, spectrogram, and resize) to process raw audio clips for prediction use. Use Python programming to generate an H5 AI model for audio prediction purposes. Deploy and run the H5 AI model inside the Raspberry Pi 5 to control the movement of the servo motor with audio order. Testing the model with a real-time audio prediction process.

Who this course is for:

Everyone needs a comprehensive understanding of audio file manipulation in machine learning.
Everyone needs a comprehensive understanding of audio file manipulation in Python.
Everyone needs a comprehensive understanding of audio file manipulation with Raspberry Pi 5.

Audio Classification using Convolutional Neural Net

What you'll learn

Explore related topics

Course content

Introduction1 lecture • 7min

Preparing Audio Files for Machine Learning use.12 lectures • 1hr 20min

Handling Audio Files in Python.4 lectures • 41min

Audio Labeling, Audio Pipeline, Audio Slicing, and Audio Batching3 lectures • 25min

Convolutional Neural Networks CNN for Audio Classification4 lectures • 38min

Assembly of Raspberry Pi 5 and installing Software and Python Libraries4 lectures • 34min

Deploy and Run AI Audio Model in Raspberry Pi 56 lectures • 54min

CNN vs RNN Audio Detection in Python for Patient Groans Dataset5 lectures • 1hr 30min

Requirements

Description

Who this course is for: