Udemy
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
Development
Web Development Data Science Mobile Development Programming Languages Game Development Database Design & Development Software Testing Software Engineering Development Tools No-Code Development
Business
Entrepreneurship Communications Management Sales Business Strategy Operations Project Management Business Law Business Analytics & Intelligence Human Resources Industry E-Commerce Media Real Estate Other Business
Finance & Accounting
Accounting & Bookkeeping Compliance Cryptocurrency & Blockchain Economics Finance Finance Cert & Exam Prep Financial Modeling & Analysis Investing & Trading Money Management Tools Taxes Other Finance & Accounting
IT & Software
IT Certification Network & Security Hardware Operating Systems Other IT & Software
Office Productivity
Microsoft Apple Google SAP Oracle Other Office Productivity
Personal Development
Personal Transformation Personal Productivity Leadership Career Development Parenting & Relationships Happiness Esoteric Practices Religion & Spirituality Personal Brand Building Creativity Influence Self Esteem & Confidence Stress Management Memory & Study Skills Motivation Other Personal Development
Design
Web Design Graphic Design & Illustration Design Tools User Experience Design Game Design Design Thinking 3D & Animation Fashion Design Architectural Design Interior Design Other Design
Marketing
Digital Marketing Search Engine Optimization Social Media Marketing Branding Marketing Fundamentals Marketing Analytics & Automation Public Relations Advertising Video & Mobile Marketing Content Marketing Growth Hacking Affiliate Marketing Product Marketing Other Marketing
Lifestyle
Arts & Crafts Beauty & Makeup Esoteric Practices Food & Beverage Gaming Home Improvement Pet Care & Training Travel Other Lifestyle
Photography & Video
Digital Photography Photography Portrait Photography Photography Tools Commercial Photography Video Design Other Photography & Video
Health & Fitness
Fitness General Health Sports Nutrition Yoga Mental Health Dieting Self Defense Safety & First Aid Dance Meditation Other Health & Fitness
Music
Instruments Music Production Music Fundamentals Vocal Music Techniques Music Software Other Music
Teaching & Academics
Engineering Humanities Math Science Online Education Social Science Language Teacher Training Test Prep Other Teaching & Academics
AWS Certification Microsoft Certification AWS Certified Solutions Architect - Associate AWS Certified Cloud Practitioner CompTIA A+ Cisco CCNA CompTIA Security+ Amazon AWS AWS Certified Developer - Associate
Graphic Design Photoshop Adobe Illustrator Drawing Digital Painting InDesign Character Design Canva Figure Drawing
Life Coach Training Neuro-Linguistic Programming Mindfulness Personal Development Personal Transformation Life Purpose Meditation CBT Emotional Intelligence
Web Development JavaScript React CSS Angular PHP WordPress Node.Js Python
Google Flutter Android Development iOS Development Swift React Native Dart Programming Language Mobile Development Kotlin SwiftUI
Digital Marketing Google Ads (Adwords) Social Media Marketing Marketing Strategy Google Ads (AdWords) Certification Internet Marketing YouTube Marketing Email Marketing Retargeting
SQL Microsoft Power BI Tableau Business Analysis Business Intelligence MySQL Data Analysis Data Modeling Big Data
Business Fundamentals Entrepreneurship Fundamentals Online Business Business Strategy Business Plan Startup Freelancing Blogging Home Business
Unity Game Development Fundamentals Unreal Engine C# 3D Game Development C++ 2D Game Development Unreal Engine Blueprints Blender
30-Day Money-Back Guarantee
Development Data Science Machine Learning

Feature Engineering for Machine Learning

Transform the variables in your data and build better performing machine learning models
Rating: 4.7 out of 54.7 (1,767 ratings)
11,447 students
Created by Soledad Galli
Last updated 1/2021
English
English [Auto]
30-Day Money-Back Guarantee

What you'll learn

  • Learn multiple techniques for missing data imputation
  • Transform categorical variables into numbers while capturing meaningful information
  • Learn how to deal with infrequent, rare and unseen categories
  • Transform skewed variables into Gaussian
  • Convert numerical variables into discrete
  • Remove outliers from your variables
  • Extract meaningful features from dates and time variables
  • Learn techniques used in organisations worldwide and in data competitions
  • Increase your repertoire of techniques to preprocess data and build more powerful machine learning models
Curated for the Udemy for Business collection

Course content

14 sections • 134 lectures • 10h 27m total length

  • Preview05:16
  • Preview06:00
  • Preview03:08
  • How to approach this course
    01:09
  • Setting up your computer
    01:27
  • Course Material
    01:59
  • Download Jupyter notebooks
    00:15
  • Download datasets
    01:23
  • Download course presentations
    00:04
  • Moving Forward
    02:14
  • FAQ: Data Science, Python programming, datasets, presentations and more...
    00:42

  • Variables | Intro
    02:37
  • Numerical variables
    05:03
  • Categorical variables
    03:43
  • Date and time variables
    01:58
  • Mixed variables
    02:16
  • Quiz about variable types
    13 questions

  • Variable characteristics
    02:43
  • Missing data
    06:46
  • Cardinality - categorical variables
    05:03
  • Rare Labels - categorical variables
    04:54
  • Linear models assumptions
    09:13
  • Linear model assumptions - additional reading resources (optional)
    00:35
  • Variable distribution
    05:08
  • Outliers
    08:27
  • Variable magnitude
    03:08
  • Bonus: Machine learning algorithms overview
    00:10
  • Bonus: Additional reading resources
    00:38

  • Introduction to missing data imputation
    03:58
  • Complete Case Analysis
    06:46
  • Mean or median imputation
    07:53
  • Arbitrary value imputation
    06:42
  • End of distribution imputation
    04:53
  • Frequent category imputation
    06:56
  • Missing category imputation
    04:05
  • Random sample imputation
    14:17
  • Adding a missing indicator
    05:26
  • Mean or median imputation with Scikit-learn
    10:33
  • Arbitrary value imputation with Scikit-learn
    05:35
  • Frequent category imputation with Scikit-learn
    03:48
  • Missing category imputation with Scikit-learn
    02:46
  • Adding a missing indicator with Scikit-learn
    04:06
  • Automatic determination of imputation method with Sklearn
    08:24
  • Introduction to Feature-engine
    05:10
  • Mean or median imputation with Feature-engine
    04:51
  • Arbitrary value imputation with Feature-engine
    03:30
  • End of distribution imputation with Feature-engine
    04:46
  • Frequent category imputation with Feature-engine
    01:38
  • Missing category imputation with Feature-engine
    02:57
  • Random sample imputation with Feature-engine
    02:28
  • Adding a missing indicator with Feature-engine
    04:06
  • Overview of missing value imputation methods
    00:08
  • Conclusion: when to use each missing data imputation method
    01:27

  • Multivariate Imputation
    03:31
  • KNN Impute
    04:22
  • KNN Impute - Demo
    07:04
  • MICE
    07:07
  • missForest
    01:07
  • MICE and missForest - Demo
    03:58
  • Additional Reading resources (Optional)
    00:12

  • Categorical encoding | Introduction
    06:49
  • One hot encoding
    06:09
  • Important: Feature-engine version 1.0.0
    00:22
  • One-hot-encoding: Demo
    14:12
  • One hot encoding of top categories
    03:06
  • One hot encoding of top categories | Demo
    08:35
  • Ordinal encoding | Label encoding
    01:50
  • Ordinal encoding | Demo
    08:08
  • Count or frequency encoding
    03:11
  • Count encoding | Demo
    04:33
  • Target guided ordinal encoding
    02:50
  • Target guided ordinal encoding | Demo
    08:30
  • Mean encoding
    02:22
  • Mean encoding | Demo
    05:31
  • Probability ratio encoding
    06:13
  • Weight of evidence (WoE)
    04:36
  • Weight of Evidence | Demo
    12:38
  • Comparison of categorical variable encoding
    10:36
  • Rare label encoding
    04:31
  • Rare label encoding | Demo
    10:25
  • Binary encoding and feature hashing
    06:12
  • Summary table of encoding techniques
    00:05
  • Bonus: Additional reading resources
    00:18

  • Variable Transformation | Introduction
    04:48
  • Variable Transformation with Numpy and SciPy
    07:38
  • variable Transformation with Scikit-learn
    07:03
  • Variable transformation with Feature-engine
    03:41

  • Discretisation | Introduction
    03:01
  • Equal-width discretisation
    04:06
  • Important: Feature-engine v 1.0.0
    00:17
  • Equal-width discretisation | Demo
    11:18
  • Equal-frequency discretisation
    04:13
  • Equal-frequency discretisation | Demo
    07:16
  • K-means discretisation
    04:13
  • K-means discretisation| Demo
    02:43
  • Discretisation plus categorical encoding
    02:54
  • Discretisation plus encoding | Demo
    05:45
  • Discretisation with classification trees
    05:05
  • Discretisation with decision trees using Scikit-learn
    11:55
  • Discretisation with decision trees using Feature-engine
    03:48
  • Domain knowledge discretisation
    03:52
  • Bonus: Additional reading resources
    00:08

  • Outlier Engineering | Intro
    07:42
  • Outlier trimming
    07:21
  • Outlier capping with IQR
    06:24
  • Outlier capping with mean and std
    04:44
  • Outlier capping with quantiles
    03:17
  • Arbitrary capping
    03:33
  • Important: Feature-engine v1.0.0
    00:07
  • Additional reading resources
    00:05

  • Feature scaling | Introduction
    03:43
  • Standardisation
    05:30
  • Standardisation | Demo
    04:38
  • Mean normalisation
    04:01
  • Mean normalisation | Demo
    05:20
  • Scaling to minimum and maximum values
    03:23
  • MinMaxScaling | Demo
    03:00
  • Maximum absolute scaling
    03:01
  • MaxAbsScaling | Demo
    03:44
  • Scaling to median and quantiles
    02:45
  • Robust Scaling | Demo
    02:03
  • Scaling to vector unit length
    05:50
  • Scaling to vector unit length | Demo
    05:17
  • Additional reading resources
    00:09

Requirements

  • A Python installation
  • Jupyter notebook installation
  • Python coding skills
  • Some experience with Numpy and Pandas
  • Familiarity with Machine Learning algorithms
  • Familiarity with Scikit-Learn

Description

Welcome to Feature Engineering for Machine Learning, the most comprehensive course on feature engineering available online.

In this course, you will learn how to engineer features and build more powerful machine learning models.


Who is this course for?

So, you’ve made your first steps into data science, you know the most commonly used prediction models, you probably built a linear regression or a classification tree model. At this stage you’re probably starting to encounter some challenges - you realize that your data set is dirty, there are lots of values missing, some variables contain labels instead of numbers, others do not meet the assumptions of the models, and on top of everything you wonder whether this is the right way to code things up. And to make things more complicated, you can’t find many consolidated resources about feature engineering. Maybe only blogs? So you may start to wonder: how are things really done in tech companies?

This course will help you! This is the most comprehensive online course in variable engineering. You will learn a huge variety of engineering techniques used worldwide in different organizations and in data science competitions, to clean and transform your data and variables.


What will you learn?

I have put together a fantastic collection of feature engineering techniques, based on scientific articles, white papers, data science competitions, and of course my own experience as a data scientist.

Specifically, you will learn:

  • How to impute your missing data

  • How to encode your categorical variables

  • How to transform your numerical variables so they meet ML model assumptions

  • How to convert your numerical variables into discrete intervals

  • How to remove outliers

  • How to handle date and time variables

  • How to work with different time zones

  • How to handle mixed variables which contain strings and numbers

Throughout the course, you are going to learn multiple techniques for each of the mentioned tasks, and you will learn to implement these techniques in an elegant, efficient, and professional manner, using Python, NumPy, Scikit-learn, pandas and a special open-source package that I created especially for this course: Feature- engine.


At the end of the course, you will be able to implement all your feature engineering steps in a single and elegant pipeline, which will allow you to put your predictive models into production with maximum efficiency.


Want to know more? Read on...

In this course, you will initially become acquainted with the most widely used techniques for variable engineering, followed by more advanced and tailored techniques, which capture information while encoding or transforming your variables. You will also find detailed explanations of the various techniques, their advantages, limitations and underlying assumptions and the best programming practices to implement them in Python.


This comprehensive feature engineering course includes over 100 lectures spanning about 10 hours of video, and ALL topics include hands-on Python code examples which you can use for reference and for practice, and re-use in your own projects.


In addition, the code is updated regularly to keep up with new trends and new Python library releases.

So what are you waiting for? Enroll today, embrace the power of feature engineering and build better machine learning models.

Who this course is for:

  • Data Scientists who want to get started in pre-processing datasets to build machine learning models
  • Data Scientists who want to learn more techniques for feature engineering for machine learning
  • Data Scientist who want to limprove their coding skills and best programming practices for feature engineering
  • Software engineers, mathematicians and academics switching careers into data science
  • Data Scientists who want to try different feature engineering techniques on data competitions
  • Software engineers who want to learn how to use Scikit-learn and other open-source packages for feature engineering

Featured review

Josep Maria Niubo Marti
Josep Maria Niubo Marti
22 courses
8 reviews
Rating: 5.0 out of 5a year ago
It is an eye opener! This course tackles the task of feature engineering on a very exhaustive and precise way. It explores ways I ignored and certainly helped me broaden my feature engineering toolkit, and thus helped me obtain better ML models. Thank you for such a great course!

Instructor

Soledad Galli
Lead Data Scientist
Soledad Galli
  • 4.6 Instructor Rating
  • 5,828 Reviews
  • 25,222 Students
  • 6 Courses

Soledad Galli is a lead data scientist and founder of Train in Data. She has experience in finance and insurance, received a Data Science Leaders Award in 2018 and was selected “LinkedIn’s voice” in data science and analytics in 2019. Sole is passionate about sharing knowledge and helping others succeed in data science.

As a data scientist in Finance and Insurance companies, Sole researched, developed and put in production machine learning models to assess Credit Risk, Insurance Claims and to prevent Fraud, leading in the adoption of machine learning in the organizations.

Sole is passionate about empowering people to step into and excel in data science. She mentors data scientists, writes articles online, speaks at data science meetings, and teaches online courses on machine learning.

Sole has recently created Train In Data, with the mission to facilitate and empower people and organizations worldwide to step into and excel in data science and analytics.

Sole has an MSc in Biology, a PhD in Biochemistry and 8+ years of experience as a research scientist in well-known institutions like University College London and the Max Planck Institute. She has scientific publications in various fields such as Cancer Research and Neuroscience, and her research was covered by the media on multiple occasions.

Soledad has 4+ years of experience as an instructor in Biochemistry at the University of Buenos Aires, taught seminars and tutorials at University College London, and mentored MSc and PhD students at Universities.

Feel free to contact her on LinkedIn.


========================


Soledad Galli es científica de datos y fundadora de Train in Data. Tiene experiencia en finanzas y seguros, recibió el premio Data Science Leaders Award en 2018 y fue seleccionada como "la voz de LinkedIn" en ciencia y análisis de datos en 2019. A Soledad le apasiona compartir conocimientos y ayudar a otros a tener éxito en la ciencia de datos.


Como científica de datos en compañías de finanzas y seguros, Sole desarrolló y puso en producción modelos de aprendizaje automático para evaluar el riesgo crediticio, automatizar reclamos de seguros y para prevenir el fraude, facilitando la adopción del aprendizaje de máquina en estas organizaciones.


A Sole le apasiona ayudar a que las personas aprendan y se destaquen en ciencia de datos, es por eso habla regularmente en reuniones de ciencia de datos, escribe varios artículos disponibles en la web y crea cursos sobre aprendizaje de máquina.


Sole ha creado recientemente Train In Data, con la misión de ayudar a las personas y organizaciones de todo el mundo a que aprendan y se destaquen en la ciencia y análisis de datos.


Sole tiene una maestría en biología, un doctorado en bioquímica y más de 8 años de experiencia como investigadora científica en instituciones prestigiosas como University College London y el Instituto Max Planck. Tiene publicaciones científicas en diversos campos, como la investigación contra el Cáncer y la Neurociencia, y sus resultados fueron cubiertos por los medios en múltiples ocasiones.


Soledad tiene más de 4 años de experiencia como instructora de bioquímica en la Universidad de Buenos Aires, dio seminarios y tutoriales en University College London, en Londres, y fue mentora de estudiantes de maestría y doctorado en diferentes universidades.


No dudes en contactarla en LinkedIn.

  • Udemy for Business
  • Teach on Udemy
  • Get the app
  • About us
  • Contact us
  • Careers
  • Blog
  • Help and Support
  • Affiliate
  • Terms
  • Privacy policy
  • Cookie settings
  • Sitemap
  • Featured courses
Udemy
© 2021 Udemy, Inc.