
Explore unsupervised learning by grouping fruits into clusters based on geometries, textures, volume, and colors, without labels, to reveal hidden patterns and customer segments.
Analyze data preparation and preliminary analysis of a dataset of 1460 homes with 79 predictors to estimate sale price, guided by CRISP-DM, exploring correlations, missing values, outliers, and feature engineering.
Explore descriptive statistics for concise data summaries, using numerical and graphical methods, and inferential statistics that generalize from a random sample to a population via probability theory.
Identify qualitative and quantitative variable types, convert categorical data with encoding, and summarize with mean, median, and mode; distinguish sample from population using standard deviation and standard error.
Explore dispersion and relationships by examining quartiles, percentiles, and the interquartile range, and compare covariance with Pearson correlation to assess linear associations in data.
Compare frequentist and Bayesian probabilities and their treatment of prior information. See how priors are updated with new data and applied in machine learning.
Explore the axioms: nonnegativity, total probability, union, and intersection, and the independence notion between events. Independence helps avoid multicollinearity, boosts interpretability, and improves model performance in ml.
Explore how to test for normality using visual q-q plots and key statistical tests such as Shapiro-Wilk, Kolmogorov-Smirnov, Anderson-Darling, and D'agostino's K-squared, and interpret p-values in regression errors.
Explore the Poisson distribution and Poisson processes, modeling random, independent events with a constant average rate lambda, and understand the pmf, inter-arrival times, and real-world examples.
Explain the p value and its role in hypothesis testing, including null and alternative hypotheses, statistical significance, significance levels, and how p values guide rejecting the null.
Explore how degrees of freedom define independent information in a data sample and use the F statistic, with df1 and df2, to assess variances in ANOVA and regression.
Use the chi squared test to assess the relationship between two categorical variables with contingency tables, observed versus expected frequencies, and the null hypothesis of no association.
Explore time series decomposition and autoregressive models to forecast and analyze seasonality, trend, and residuals. Compare additive and multiplicative decompositions and use Python with statsmodels to extract trend and seasonality.
Explore autoregressive models that forecast time series from past values, assess stationarity, and use acf and pacf to identify ar order and arima adjustments for nonstationary data.
Machine Learning is one of the hottest technologies of our time! If you are new to ML and want to become a Data Scientist, you need to understand the mathematics behind ML algorithms. There is no way around it. It is an intrinsic part of the role of a Data Scientist and any recruiter or experienced professional will attest to that. The enthusiast who is interested in learning more about the magic behind Machine Learning algorithms currently faces a daunting set of prerequisites: Programming, Large Scale Data Analysis, mathematical structures associated with models and knowledge of the application itself. A common complaint of mathematics students around the world is that the topics covered seem to have little relevance to practical problems. But that is not the case with Machine Learning.
This course is not designed to make you a Mathematician, but it does provide a practical approach to working with data and focuses on the key mathematical concepts that you will encounter in machine learning studies. It is designed to fill in the gaps for students who have missed these key concepts as part of their formal education, or who need to catch up after a long break from studying mathematics.
Upon completing the course, students will be equipped to understand and apply mathematical concepts to analyze and develop machine learning models, including Large Language Models.