XGBoost Machine Learning for Data Science and Kaggle
What you'll learn
- How is xgboost algorithm working to predict different model targets
- What are the roles that decision trees play in gradient boost and Xgboost modeling
- Why XGBoost is so far one of the most powerful and stable machine learning methods in Kaggle contests
- How to explain and set appropriate Xgboost modeling parameters
- How to apply data exploration, cleaning and preparation for Xgboost method
- How to effectively implement the different types of xgboost models using the packages in Python
- How to perform feature engineering in Xgboost predictive modeling
- How to conduct statistical analysis and feature selection in Xgboost modeling
- How to explain and select the typical evaluation measures and model objectives for building Xgboost models
- How to perform cross validation and determine the best parameter thresholds
- How to proceed parameter tuning in Xgboost model building
- How to successfully apply Xgboost into solving various machine learning problems
Requirements
- Basic math background
- Basic computer skills
Description
The future world is the AI era of machine learning, so mastering the application of machine learning is equivalent to getting a key to the future career. If you can only learn one tool or algorithm for machine learning or building predictive models now, what is this tool? Without a doubt, that is Xgboost! If you are going to participate in a Kaggle contest, what is your preferred modeling tool? Again, the answer is Xgboost! This is proven by countless experienced data scientists and new comers. Therefore, you must register for this course!
The Xgboost is so famous in Kaggle contests because of its excellent accuracy, speed and stability. For example, according to the survey, more than 70% the top kaggle winners said they have used XGBoost.
The Xgboost is really useful and performs manifold functionalities in the data science world; this powerful algorithm is so frequently utilized to predict various types of targets – continuous, binary, categorical data, it is also found Xgboost very effective to solve different multiclass or multilabel classification problems. In addition, the contests on Kaggle platform covered almost all the applications and industries in the world, such as retail business, banking, insurance, pharmaceutical research, traffic control and credit risk management.
The Xgboost is powerful, but it is not that easy to exercise it full capabilities without expert’s guidance. For example, to successfully implement the Xgboost algorithm, you also need to understand and adjust many parameter settings. For doing so, I will teach you the underlying algorithm so you are able to configure the Xgboost that tailor to different data and application scenarios. In addition, I will provide intensive lectures on feature engineering, feature selection and parameters tuning aiming at Xgboost. So, after training you should also be able to prepare the suitable data or features that can well feed the XGBoost model.
This course is really practical but not lacking in theory; we start from decision trees and its related concepts and components, transferring to constructing the gradient boot methods, then leading to the Xgboost modeling. The math and statistics are mildly applied to explain the mechanisms in all machine learning methods. We use the Python pandas data frames to deal with data exploration and cleaning. One significant feature of this course is that we have used many Python program examples to demonstrate every single knowledge point and skill you have learned in the lecture.
Who this course is for:
- Anyone who enjoys the Kaggle contests
- Anyone who wishes to learn how to apply machine learning and data science approaches into business
Instructor
Having successfully led the development of cutting-edge risk models using Big data at multiple major financial institutions and excelled in the advanced analytics field for the past 15 years, I am very enthusiastic at transferring knowledge and skills to the job seekers and new comers in the field of data analytics and application to business.
I hold a PhD in Statistics and operational research. I am also a passionate educator, teaching as a principal instructor of a Toronto-based college, including advanced SAS data mining, Python & R for data science and machine learning and Big data analytics foundation and projects and for over 10 years. As a result, I have helped many of my students land their dream jobs in advanced analytics.
I am also a Big data experts in machine learning, predictive modelling and retail/marketing analytics in Canada. My work includes but not limited to: implementation of Big data analysis for credit bureau, model vetting and validation for banking capital market, and customers attrition and life stage/life style segmentation for retail banks as well as big market firms. I also worked closely with senior executives and Big data architects in the field of health science to provide strategic advice.