Improving data quality in data analytics & machine learning
What you'll learn
- Strategies for increasing data quality
- Ways to assess data quality
- Interpreting data visualizations
- How to spot problems in data
Requirements
- Interest in working with data
- Interest in knowing more about data quality
- Some Python skills are useful for the optional coding videos
Description
All of our decisions are based on data. Our sense organs gather data, our memories are data, and our gut-instincts are data. If you want to make good decisions, you need to have high-quality data.
This course is about data quality: What it means, why it's important, and how you can increase the quality of your data.
In this course, you will learn:
High-level strategies for ensuring high data quality, including terminology, data documentation and management, and the different research phases in which you can check and increase data quality.
Qualitative and quantitative methods for evaluating data quality, including visual inspection, error rates, and outliers. Python code is provided to see how to implement these visualizations and scoring methods using pandas, numpy, seaborn, and matplotlib.
Specific data methods and algorithms for cleaning data and rejecting bad or unusual data. As above, Python code is provided to see how to implement these procedures using pandas, numpy, seaborn, and matplotlib.
This course is for
Data practitioners who want to understand both the high-level strategies and the low-level procedures for evaluating and improving data quality.
Managers, clients, and collaborators who want to understand the importance of data quality, even if they are not working directly with data.
Who this course is for:
- Data science practitioners
- Data scientist students
- Managers or colleagues who work with data practitioners
Instructor
I am a full-time educator and writer, and former professor of neuroscience. I "retired" from that position so I could focus my time and energy creating high-quality educational material just for you.
I have 20 years of experience teaching programming, data analysis, signal processing, statistics, linear algebra, and experiment design. I've taught undergraduate students, PhD candidates, postdoctoral researchers, and full professors. I have taught in "traditional" university courses, special week-long intensive courses, and Nobel prize-winning research labs. I have >100 hours of online lectures on neuroscience data analysis that you can find on my website and youtube channel. And I've written several technical books about these topics with a few more on the way.
I'm not trying to show off -- I'm trying to convince you that you've come to the right place to maximize your learning from an instructor who has spent two decades refining and perfecting his teaching style.
Over 200,000 students have watched over 15,000,000 minutes of my courses. Come find out why!
I have several free courses that you can enroll in. Try them out! You got nothing to lose ;)
-------------------------
By popular request, here are suggested course progressions for various educational goals:
MATLAB programming: Get Started with MATLAB; Master MATLAB; Image Processing
Python programming: Master Python programming by solving scientific projects; Master Math by Coding in Python
Applied linear algebra: Complete Linear Algebra; Dimension Reduction
Signal processing: Understand the Fourier Transform; Generate and visualize data; Signal Processing; Neural signal processing