Udemy
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
Development
Web Development Data Science Mobile Development Programming Languages Game Development Database Design & Development Software Testing Software Engineering Software Development Tools No-Code Development
Business
Entrepreneurship Communication Management Sales Business Strategy Operations Project Management Business Law Business Analytics & Intelligence Human Resources Industry E-Commerce Media Real Estate Other Business
Finance & Accounting
Accounting & Bookkeeping Compliance Cryptocurrency & Blockchain Economics Finance Finance Cert & Exam Prep Financial Modeling & Analysis Investing & Trading Money Management Tools Taxes Other Finance & Accounting
IT & Software
IT Certifications Network & Security Hardware Operating Systems & Servers Other IT & Software
Office Productivity
Microsoft Apple Google SAP Oracle Other Office Productivity
Personal Development
Personal Transformation Personal Productivity Leadership Career Development Parenting & Relationships Happiness Esoteric Practices Religion & Spirituality Personal Brand Building Creativity Influence Self Esteem & Confidence Stress Management Memory & Study Skills Motivation Other Personal Development
Design
Web Design Graphic Design & Illustration Design Tools User Experience Design Game Design 3D & Animation Fashion Design Architectural Design Interior Design Other Design
Marketing
Digital Marketing Search Engine Optimization Social Media Marketing Branding Marketing Fundamentals Marketing Analytics & Automation Public Relations Paid Advertising Video & Mobile Marketing Content Marketing Growth Hacking Affiliate Marketing Product Marketing Other Marketing
Lifestyle
Arts & Crafts Beauty & Makeup Esoteric Practices Food & Beverage Gaming Home Improvement & Gardening Pet Care & Training Travel Other Lifestyle
Photography & Video
Digital Photography Photography Portrait Photography Photography Tools Commercial Photography Video Design Other Photography & Video
Health & Fitness
Fitness General Health Sports Nutrition & Diet Yoga Mental Health Martial Arts & Self Defense Safety & First Aid Dance Meditation Other Health & Fitness
Music
Instruments Music Production Music Fundamentals Vocal Music Techniques Music Software Other Music
Teaching & Academics
Engineering Humanities Math Science Online Education Social Science Language Learning Teacher Training Test Prep Other Teaching & Academics
Web Development JavaScript React CSS Angular Node.Js Typescript HTML5 PHP
AWS Certification Microsoft Certification AWS Certified Solutions Architect - Associate AWS Certified Cloud Practitioner CompTIA A+ Amazon AWS Cisco CCNA CompTIA Security+ Microsoft AZ-900
Microsoft Power BI SQL Tableau Data Modeling Business Analysis Data Analysis Data Warehouse Business Intelligence Blockchain
Unity Unreal Engine Game Development Fundamentals C# 3D Game Development C++ Unreal Engine Blueprints 2D Game Development Mobile Game Development
Google Flutter iOS Development Android Development Swift React Native Dart (programming language) Kotlin SwiftUI Mobile App Development
Graphic Design Photoshop Adobe Illustrator Drawing Canva Digital Painting InDesign Design Theory Procreate Digital Illustration App
Life Coach Training Neuro-Linguistic Programming Personal Development Personal Transformation Life Purpose Mindfulness Sound Therapy Emotional Intelligence Coaching
Business Fundamentals Entrepreneurship Fundamentals Freelancing Business Strategy Online Business Startup Business Plan Blogging Amazon Kindle Direct Publishing (KDP)
Digital Marketing Social Media Marketing Marketing Strategy Internet Marketing Copywriting Email Marketing Google Analytics Startup Advertising Strategy
2022-07-05T03:42:09Z

DevelopmentData ScienceSoftware Engineering

A Tutorial on Speaker Diarization

Speaker diarization: A journey from unsupervised to supervised approaches
Hot & new
Rating: 4.4 out of 54.4 (14 ratings)
96 students
Created by Quan Wang, Chao Zhang
Last updated 7/2022
English
English [Auto]

What you'll learn

  • Basic concepts in speaker diarization
  • Commonly used algorithms in speaker diarization
  • State-of-the-art academic advances in speaker diarization
  • Coding examples of speaker diarization
  • Hands-on projects with popular toolkits including SCTK, pyannote-metrics, pyannote-audio, and uisrnn

Requirements

  • Basic knowledge in audio and speech processing
  • Basic knowledge in machine learning and neural networks
  • Basic programming in Python
  • Experience with speaker recognition (it's recommended to take the Speaker Recognition course by Dr. Quan Wang first)

Description

This course is a tutorial on speaker diarization techniques.


Speaker diarization is an advanced topic in speech processing. It solves the problem "who spoke when", or "who spoke what". It is highly relevant with many other techniques, such as voice activity detection, speaker recognition, automatic speech recognition, speech separation, statistics, and deep learning. It has found various applications in numerous scenarios, such as automatic meeting transcript generation, medical record analysis, media indexing and retrieval, and second pass speech recognition.


In this course, we will first go through the basic concepts and applications of speaker diarization, followed by the scoring and metrics. Then we will introduce the unsupervised methods in speaker diarization, starting with the commonly used modularized framework, followed by an introduction to clustering algorithms, with a focus on spectral clustering and its extensions. Next, we will talk about the problems with clustering algorithms, and introduce the supervised methods in speaker diarization. We will mainly talk about 4 supervised speaker diarization approaches, i.e. UIS-RNN, PIT/EEND, TS-VAD, and DNC. Finally, we will talk about the challenges and future research directions in speaker diarization.


For those who want to dive deep in speaker diarization, we also include video lectures from top speech conferences such as ICASSP and SLT by the instructors as additional learning materials.


Apart from the lecture videos, we have included small quizzes after each lecture to help you better understand the topics we have covered in the lecture.


Also, speaker diarization is a very practical skill. Thus we have carefully prepared various coding practices and projects, to get you familiar with the most popular toolkits which are used by various researchers and scientists, including SCTK, pyannote-metrics, pyannote-audio and uisrnn.


This course would be a great fit for students, researchers, developers, or product managers who work on audio and speech processing.

Who this course is for:

  • College and graduate students interested in audio and speech processing
  • Researchers in computer science or signal processing domains
  • Developers, system architects, and product managers for intelligent speech systems
  • Enthusiasts for cool technology

Instructors

Quan Wang
Speech Expert at Google
Quan Wang
  • 4.4 Instructor Rating
  • 70 Reviews
  • 349 Students
  • 2 Courses

Dr. Quan Wang is currently a Staff Software Engineer at Google, managing the Speaker, Voice & Language team, and an IEEE Senior Member. He was a former Machine Learning Scientist at Amazon Alexa team. Quan had been leading the efforts to deploy advanced speaker recognition technologies to various products at Google, making Google Home the first smart home speaker to support multiple users in the market.


Quan has authored 50+ impactful patents and papers in speaker recognition, speaker diarization, voice separation, speech detection, language recognition and speech synthesis, with 2900+ citations. Quan's work has received coverage by top tech media including VentureBeat, TechCrunch, Engage and CNET.


Quan is the author of the textbook "Voice Identity Techniques: From core algorithms to engineering practice", which was selected by the bestselling books about AI leaderboard in China, and won the Distinguished Author of Year 2020 Award.

Chao Zhang
Research scientist in AI @ Google
Chao Zhang
  • 4.4 Instructor Rating
  • 14 Reviews
  • 96 Students
  • 1 Course

Dr Chao Zhang received his B.E. and M.S. degrees in Computer Science & Technology from Tsinghua University and received his PhD degree in Information Engineering at the University of Cambridge. Before joining Google, he was a Research Associate at Cambridge University and an Advisor and Speech Team Co-leader at JD AI Research. He has published 60 peer-reviewed papers in speech and language processing and received the best student paper awards from ICASSP 2014, ASRU 2019, and SLT 2021. He is also a Visiting Fellow at Cambridge University and a member of multiple technical committees.

Top companies choose Udemy Business to build in-demand career skills.
NasdaqVolkswagenBoxNetAppEventbrite
  • Udemy Business
  • Teach on Udemy
  • Get the app
  • About us
  • Contact us
  • Careers
  • Blog
  • Help and Support
  • Affiliate
  • Investors
  • Terms
  • Privacy policy
  • Sitemap
  • Accessibility statement
Udemy
© 2022 Udemy, Inc.