This is a complete course for the formation of Data Scientist, with more than 250 exercises from A-Z, covering from the most basic to the most advanced concepts. It focuses on active methodologies, where the student is the protagonist in this process, thus, we bring several solved exercises, notebooks of contents summary and much more, with a focus on learning programming based on practice and simulation of real problems (such as data cleaning, treatment of missings, separation of data in training and testing, grouping and joining of datasets, among others).
In this sense, the course has exercises solved on the main Python libraries for Data Science: NumPy, Pandas, Matplotlib and Seaborn. In addition, it seeks to rescue elementary concepts of Linear Algebra, through the NumPy library.
In general, the course presents exercises that encompass the main functions of NumPy for Data Science, such as aggregation functions, matrix definition, matrix operations, among others. As for Pandas, we seek to offer an overview from the definition of Series and DataFrames, inspection of datasets, boolean selection, filtering of rows of columns, removal of rows and columns, treatment of missing data, grouping and joining functions, opening and writing files, descriptive statistics functions, among other topics.
Finally, there are several problems related to data visualization, with the libraries Matplotlib and Seaborn, from classic datasets. Notions of time series and finance are also introduced. There are also examples of how to prepare a dataset for a Machine Learning project.