Exploratory Data Analysis in Python
What you'll learn
- Exploring a dataset for calculating overall statistics
- Visualize the correlations between the features
- Visualize the predictive power of the features
- Create useful insights from a dataset
Requirements
- Python programming language
Description
When we put our hands on a dataset for the first time, we can’t wait to test several models and algorithms. This is wrong because if we don’t know the information before feeding our model, the results will be unreliable and the model itself will surely fail. Moreover, if we don’t select the best features in advance, the training phase becomes slow and the model won’t learn anything useful.
So, the first approach we must have is to take a look at our dataset and visualize the information it contains. In other words, we have to explore it.
That’s the purpose of the Exploratory Data Analysis.
EDA is an important step of data science and machine learning. It helps us explore the information hidden inside a dataset before applying any model or algorithm. It makes heavy use of data visualization, it’s bias-free.
Moreover, it lets us figure out whether our features have predictive power or not, determining if the machine learning project we are working on has chances to be successful. Without EDA, we may give the wrong data to a model without reaching any success.
With this course, the student will learn:
How to visualize information that is hidden inside the dataset
How to visualize the correlation and the importance of the columns of a dataset
Some useful Python libraries
All the lessons are practical and made using Python programming language and Jupyter notebooks. All the notebooks are downloadable.
Who this course is for:
- Python developers
- Data scientists
Instructor
My name is Gianluca Malato, I'm Italian and have a Master's Degree cum laude in Theoretical Physics of disordered systems at "La Sapienza" University of Rome.
I'm a Data Scientist who has been working for years in the banking and insurance sector. I have extensive experience in software programming and project management and I have been dealing with data analysis and machine learning in the corporate environment for several years.
I am also skilled in data analysis (e.g. relational databases and SQL language), numerical algorithms (e.g. ODE integration, optimization algorithtms) and simulation (e.g. Monte Carlo techniques).
I've written many articles about Machine Learning, R and Python and I've been a Top Writer on Medium in Artificial Intelligence category.