Improve Score on Kaggle's Titanic Competition
What you'll learn
- Students will learn about sklearn, which is Python's machine learning library
- Students will learn about the different classifiers in the sklearn library
- Students will learn how to extract information from a string of text
- Students will learn how to impute missing variables and how to do this using sklearn
- Students will learn how to scale data and how to do this in sklearn
- Students will learn how to bin data, which converts a feature from a series to a categorical column
- Students will learn about the various ways to employ feature selection
- Students will learn how to select a classifier
- Students will learn how to tune parameters using sklearn's GridSearchCV
- Students will learn about the various ensemble methods and how they can improve accuracy
- Students will take all they have learned and enter Kaggle's Titanic competition
- Basic IT skills and basic programming in Python
This course is designed to teach the student advanced classification techniques that will enable him to enter Kaggle's Titanic competition and achieve an improved score by using standard machine learning methods.
After the student is introduced to the course, he will receive an introduction to Python's machine learning library, sklearn. The student will also be introduced to the website, OpenML, which is a repository of a multitude of datasets. The Titanic dataset in the OpenML website is used in the lessons in this course all the way up to the point that the student is invited to enter the Kaggle Titanic competition and employ all of the advanced classification techniques that he has learned in the course.
The student will be introduced to various classifiers in the sklearn library and will use sklearn's DummyClassifier to establish a baseline prediction accuracy.
The student will then learn to extract the title from the name of the passenger on the Titanic and build a dictionary that will group the titles into a logical fashion. When the titles have been extracted from the name, a new column was created to house the dictionary provided items.
The student will be introduced to different ways to impute missing values. Some of the methods are accomplished by simply coding a replacement text, but some of them will have more advanced imputation techniques used on them from the sklearn library.
The student will be introduced to different ways to scale the data. Some ways to scale the data are to simply code the scaling formulas into the script, but other ways are to use sklearn's more advanced scaling techniques on the data.
The student will learn to create bins so a series of values will be converted to categories in an attempt to improve the score of the prediction. Once the selected columns are binned and categorical columns are created, those categorical columns will be converted to object columns.
The student will be introduced to feature selection and will employ manual feature selection on the Titanic dataset because it does not have too many features to select.
The student will be introduced to model selection and will select three models from the sklearn library, which will be used to make predictions on the algorithm.
The student will be introduced to sklearn's GridSearchCV and will use this method to tune the parameters of the three models he has previously selected.
The students will be introduced to sklearn's various ensemble methods and will use one of those methods to take the three models that have been parameter tuned and derive an optimum score when the three models are combined.
When the student has learned about the various advanced classification techniques to improve the accuracy of the score, he will be invited to enter Kaggle's Titanic competition and employ all of the techniques he has learned in the course to demonstrate that the score can indeed be improved using these techniques.
Who this course is for:
- Python developers who have some knowledge of Python and how to enter a Kaggle competition
I have almost five decades experience in work, to include United States Air Force, the corporate sector, non profit sectors, and charities. I also have a BA in Computer Studies, a MSc in Finance, and have a Diploma in Accounting through the AAT. My hobbies include data science, creating content on social media, and writing.