What you'll learn

Basic concepts in speaker diarization
Commonly used algorithms in speaker diarization
State-of-the-art academic advances in speaker diarization
Coding examples of speaker diarization
Hands-on projects with popular toolkits including SCTK, pyannote-metrics, pyannote-audio, and uisrnn

Course content

5 sections • 20 lectures • 4h 42m total length

Introduction to this tutorial2:37
Slides and video lecture captions0:19
Basic concepts and applications7:09
Basics of diarization
Brainstorm about applications of speaker diarization
Scoring and metrics 1: Diarization errors8:18
Scoring DER with SCTK
Evaluating diarization with pyannote.metrics
Permutation-invariant metrics from scratch
The collar value in evaluation tools1:04
Scoring and metrics 2: Speaker attributed ASR7:54
Metrics and datasets

[ICASSP 2018] Speaker Diarization with LSTM29:05
[ICASSP 2019] Fully supervised speaker diarization28:09
[SLT 2021] Discriminative Neural Clustering15:12
[ICASSP 2022] Google's Turn-to-Diarize system18:55
[Interspeech 2024] Word-Level End-to-End Neural Speaker Diarization15:35
Explore word-level end-to-end neural speaker diarization, adding an auxiliary encoder and joint network to a frozen ASR model with shared blank logits.
[SANE 2024] Speaker diarization at Google: From modularized systems to LLMs48:20

Requirements

Basic knowledge in audio and speech processing
Basic knowledge in machine learning and neural networks
Basic programming in Python
Experience with speaker recognition (it's recommended to take the Speaker Recognition course by Dr. Quan Wang first)

Description

This course is a tutorial on speaker diarization techniques.

Speaker diarization is an advanced topic in speech processing. It solves the problem "who spoke when", or "who spoke what". It is highly relevant with many other techniques, such as voice activity detection, speaker recognition, automatic speech recognition, speech separation, statistics, and deep learning. It has found various applications in numerous scenarios, such as automatic meeting transcript generation, medical record analysis, media indexing and retrieval, and second pass speech recognition.

In this course, we will first go through the basic concepts and applications of speaker diarization, followed by the scoring and metrics. Then we will introduce the unsupervised methods in speaker diarization, starting with the commonly used modularized framework, followed by an introduction to clustering algorithms, with a focus on spectral clustering and its extensions. Next, we will talk about the problems with clustering algorithms, and introduce the supervised methods in speaker diarization. We will mainly talk about 4 supervised speaker diarization approaches, i.e. UIS-RNN, PIT/EEND, TS-VAD, and DNC. Finally, we will talk about the challenges and future research directions in speaker diarization.

For those who want to dive deep in speaker diarization, we also include video lectures from top speech conferences such as ICASSP and SLT by the instructors as additional learning materials.

Apart from the lecture videos, we have included small quizzes after each lecture to help you better understand the topics we have covered in the lecture.

Also, speaker diarization is a very practical skill. Thus we have carefully prepared various coding practices and projects, to get you familiar with the most popular toolkits which are used by various researchers and scientists, including SCTK, pyannote-metrics, pyannote-audio and uisrnn.

This course would be a great fit for students, researchers, developers, or product managers who work on audio and speech processing.

Who this course is for:

College and graduate students interested in audio and speech processing
Researchers in computer science or signal processing domains
Developers, system architects, and product managers for intelligent speech systems
Enthusiasts for cool technology

What you'll learn

Explore related topics

Course content

Basics of speaker diarizaton6 lectures • 27min

Unsupervised methods3 lectures • 29min

Supervised methods3 lectures • 28min

Challenges and future work2 lectures • 10min

[Optional] Additional learning materials6 lectures • 2hr 35min

Requirements

Description

Who this course is for: