Speaker Recognition | By Award Winning Textbook Author

Name: Speaker Recognition | By Award Winning Textbook Author
Rating: 4.4 (304 reviews)

Audio processing, feature extraction, speaker recognition, machine learning, and neural networks with coding examples

Created byQuan Wang

Last updated 12/2022

English

What you'll learn

Basic concepts and core algorithms in speaker recognition
Audio processing and acoustics
Machine learning and deep learning basics
Coding practice and toolkits for audio and speech
Python and PyTorch for machine learning
Building a speaker recognition system from scratch

Coding Exercises

This course includes our updated coding exercises so you can practice your skills as you learn.

Course content

10 sections • 60 lectures • 7h 53m total length

Should I take this course?3:17
Hello fellow scholars, my name is Quan Wang. I'm currently a Staff Software Engineer at Google, leading the "Speaker, Voice and Language" team. I was also a former machine learning scientist of the Amazon Alexa team.

I will be your instructor of this course, to share my knowledge and experience about speaker recognition techniques with you, and help you get prepared for your academic and career goals.

I have been working for more than 8 years in the voice identity industry.

At Google, my team and I had been developing lots of successful products. We filed lots of valuable patents, and published many impactful papers at top conferences. We frequently hit the headlines in tech news. I published a textbook about voice identity techniques, which became one of the bestselling books about AI in China. This book also won me the Distinguished Author of Year 2020 Award. So, many people consider me as kind of a "successful" scientist.

However, when I look back how I started this journey, it wasn't quite so pleasant at the very beginning.

Most of my undergraduate and Ph.D. research work had been focusing on computer vision and image processing. When I first started working on speech and speaker recognition at Amazon, I was under huge pressure. This pressure was not from my manager or anyone else. It was from myself, by realizing my knowledge and expertise really does not match what is required in my projects.

Every time I have meetings with different teams, or review people's code, documents, I don't know what people are talking about. I've even never heard of the terminologies and the acronyms, while people naturally assume you understand them. I was just feeling that I was the dullest person in the company. That experience was really terrible.

And that was the time when I really wished someone could just teach me the basic concepts in audio processing, speech, and speaker recognition. I searched the internet, but unfortunately, there were no such online courses.

I bought lots of books, and read lots of papers, online articles and tech blogs. However, there was really nothing that systematically introduces speaker recognition. Everything I could find was just fragmented information. Besides, most of the papers were very obscure, too difficult to follow for someone new in the field. Many online articles or blogs were unprofessional, even with obvious mistakes. And most technical books were already outdated when they were published.

And that is the reason why I decided to spend several years developing this course, to help anyone interested in speaker recognition techniques, to easily start working in this domain in the most frictionless way, and avoid all the frustrations that I experienced myself. Don't waste your time on fragmented, unprofessional, or outdated information. In this course, I will systematically walk you through the basic concepts from acoustics, audio processing, deep learning, to speaker recognition, and its various applications.

To summarize, what I'm going to teach in this course, is what I wish someone could have taught me many years ago - the core algorithms and engineering practice of speaker recognition.
Expected outcome from this course2:31
What is the expected outcome from this course?

Well, that really depends on who you are. This course mainly targets 3 different groups of audience. Group 1, students and researchers; group 2, industry audiences; and group 3, general audiences.

Group 1 audience should include senior college students, graduate students, as well as postdocs and technical staff members working at research institutes. For these audiences, even if you know nothing about any speech technology right now, at the end of this course, you should be able to very confidently talk about topics like audio processing, speaker recognition, deep learning, even the very latest work.

If you haven't done any research before, at the end of this course, you should be comfortable to make a decision whether you want to do your thesis in speaker recognition. If you go to a top conference like ICASSP or Interspeech, you should be comfortable to chat with people and ask people questions without fear.

In group 2, the industry audiences typically include software engineers, system architects, product managers who work on products and services that are related to voice identity techniques. For these audiences, taking this course will help to complement your current knowledge system in this domain, and help you follow the latest trends in academia. This will make you more competitive in your current position, and take your career to the next level.

And group 3, the general audiences. For this group of audience, the purpose of taking this course might be different from the other groups. Many of the lectures in this course could be too technical for the general audience who may not have the corresponding background in mathematics or computer science. For these audiences, it is OK to skip some lectures and the exercises, and only watch those lectures talking about history, applications, and high level concepts. This will help you get a clear big picture of the speaker recognition industry, expand your general knowledge, and maybe make better investment decisions. You will sound like a pro when you chat with your family and friends.
About this course
How to max your win from this course1:50
Max your win from this course
Syllabus1:44

Requirements

College level mathematics
Experience with machine learning or coding will be a plus

Description

This course is an introduction to speaker recognition techniques.

Speaker recognition lies in the intersection of audio processing, biometrics, and machine learning, and has various applications. You can find the application of speaker recognition on your smart phones, smart home devices, and various commercial services.

In this course, we will start with an introduction to the history of speaker recognition techniques, to see how it evolved from simple human efforts to modern deep learning based intelligent systems.

We will cover the basics of acoustics, perception, audio processing, signal processing, and feature extraction, so you don't need a background in these domains. We will also have an introduction of popular machine learning approaches, such as Gaussian mixture models, support vector machines, factor analysis, and neural networks.

We will focus on how to build speaker recognition systems based on acoustic features and machine learning models, with an emphasis on modern speaker recognition with deep learning, such as the different options for inference logic, loss function, and neural network topologies.

We will also talk about data processing techniques such as data cleansing, data augmentation, and data fusion.

We included lots of hands-on practices and coding examples for you to really master the topics introduced in this course, and a final project to guide you through building your own speaker recognition system from scratch.

If you are a college student interested in AI or signal processing, or a software engineer, system architect or product manager working with related technologies, then this course is definitely for you!

Who this course is for:

College students or graduate students
Engineers, researchers, and program managers in universities or industry
General audience interested in AI
Fans of cool technology

Speaker Recognition | By Award Winning Textbook Author

What you'll learn

Explore related topics

Coding Exercises

Course content

Introduction to this course4 lectures • 9min

The History of Voice Identity Techniques4 lectures • 29min

Fundamental of Audio Processing6 lectures • 48min

Acoustic Feature Extraction4 lectures • 36min

Fundamentals of Speaker Recognition7 lectures • 54min

Early Speaker Recognition Approaches9 lectures • 57min

Deep Learning Basics7 lectures • 52min

Speaker Recognition with Deep Learning7 lectures • 57min

Data Processing in Speaker Recognition8 lectures • 1hr 2min

Final Project4 lectures • 31min

Requirements

Description

Who this course is for: