
Introduce the HR attrition case study and demonstrate loading the air analytics dataset in R, exploring 35 variables and preparing a classification model to predict employee quit risk.
Rename the age variable and encode the target attrition as a factor with levels no and yes. Remove unneeded variables with dplyr and inspect the data structure for factor conversions.
Explore employee attrition through eda with ggplot visuals, computing the event rate and examining how age, travel, department, education, and environment satisfaction influence attrition.
Plot attrition across variables such as hourly rate, monthly salary, job involvement, job level, job satisfaction, marital status, monthly income, and overtime to identify significant factors.
Analyze correlations in the HR attrition dataset by converting the dependent variable to numeric and using a correlation plot to show how attrition relates to daily rate and monthly income.
Apply chi square test to evaluate dependency between categorical variables and the attrition dependent variable, using p values below 0.05 to indicate significance.
Explore information values screening to identify significant attrition predictors and rank variables by information values for logistic regression modeling.
Build a final dataset by selecting key variables including attrition, converting them to factors where needed, then split into training and test sets with an 80/20 ratio using a seed.
Train a binomial glm model on the training set, evaluate on the test set, and perform backward elimination guided by p-values, residual deviance, and AIC to select the final classifier.
Course Introduction
Understanding employee attrition is crucial for organizations aiming to retain talent. This course guides you through a hands-on case study, teaching you how to explore, clean, and model data to predict employee turnover. With practical examples and intuitive explanations, you'll gain the skills to work on real-world datasets and make impactful predictions.
Section-wise Writeup
Section 1: Introduction
The course begins by introducing the dataset and its variables. You’ll learn how to load and navigate the dataset, setting the foundation for effective data analysis.
Section 2: Exploring and Cleaning Data
In this section, you’ll dive into exploratory data analysis (EDA). Topics include renaming variables for clarity, identifying and handling missing values and duplicates, and creating detailed visualizations to uncover patterns in the data. You’ll also explore the relationship between key variables, such as total working years and attrition rates, and use correlation and chi-square tests to assess associations.
Section 3: Identifying Significant Variables
This section focuses on feature selection. You’ll use Information Value (IV) techniques to identify significant variables and refine the dataset for modeling. With the final dataset prepared, you’ll split the data into training and testing sets, setting the stage for predictive modeling.
Section 4: Predictive Modeling
Here, you’ll build a robust predictive model for attrition. Topics include training the model, making predictions on the test set, and evaluating its performance. By the end of this section, you’ll have a complete workflow for predicting employee attrition.
Conclusion
This course equips you with the skills to handle complex datasets, perform detailed exploratory analysis, and build predictive models. You’ll gain a solid understanding of feature selection, statistical testing, and model evaluation, making you adept at solving real-world problems.