
We learn about the need to understand the relationship between variables in a bivariate set in a population and how to visualize the relationship
Quantifying the relationship in terms of a Definition for Correlation
Calculating the Person’s sample correlation Coefficient, i.e., the r value
Understanding the Properties of r value
Difference between Causation & Correlation
Given a linear correlation exits between variables in a bivariate data set we explore
if there exits a possibility to predict the dependent variable y for a selected value of the independent variable x
Arriving at a best fit line, the least-squares line to carry out such a prediction
Understand the relationship between the correlation coefficient r and the slope of the least-squares line
Understand the concept of regression
Apply the least-squares line to predict the dependent variable y and understand the limitations
this lecture focusses on
methods to verify if the linear fit is the right one and arrive at measures to improve and verify the quality of the fit
Introduces the residual plots as a means to check if the linear fit is the right choice and eliminate influential data in the sample
Introduces new measures namely R-square and standard deviation to qualify the effectiveness of the least squares line fit
Study on regression fits is incomplete without understanding the non-linear regression fits
Polynomial regression & Transformations are Two main classes within non-linear curve (regression) fits
As part of the polynomial regressions : you understand the quadratic and cubic regression fits
And as part of the transformation based regressions : you understand how various transformation methods are deployed on the raw data such that a linear regression fit can finally be achieved on the transformed data set
Learn the shortcomings with the sample linear regression line in capturing the linear relationship deterministically if exists between a bivariate data pair
and introduce the need for a probabilistic model namely the Simple linear regression model
Learn the role of the error part in the above model and the necessary assumptions about the behavior of the error
Learn how the coefficients (intercept , slope) of the population regression line are inferred from the coefficients of the sample regression line
Learn the Model utility test to judge if the Simple linear regression model is suitable enough to arrive at acceptable inferences about the population characteristics
In this second part
learn how to verify the four basic assumptions about the error distribution using normal probability plots in order to arrive at a reliable simple linear regression model
learn how to improve on the model utility by identifying the outliers and the data points that exert excessive influence on the regression line fit and eliminate them before arriving at the model using the residual plots
learn the influence of the sampling variability on the precisions of the point estimate and point prediction derived out of the model
learn how to judge the precisions of the point estimate or prediction by calculating the confidence interval and the prediction interval and qualify the estimate/prediction using those intervals
Learn about
- the need for a Multiple linear regression model in situations where a single linear regression model is not adequate
- the structure of the model, model estimates and their definitions
- model composition and types of the predictor variables that can constitute the model
- inclusion of interaction terms, polynomial terms and categorical variables in the model, their characteristics and roles within the model
- and the underlying assumptions that are essential for the model to comply with
learn about
- estimating the model, estimating its coefficients and learn about arriving at point estimates and the predicted values for the response variable using the estimated model
- deriving the confidence interval and the prediction interval for the point estimate or the predicted value respectively
- evaluating the model utility using basic statistical quantities like R2 and adjusted R2 and methods like normal probability plots to verify the basic assumptions that the model should comply with
Learn about
- testing the model's utility using statistical measures like F-Statistic
- New statistical quantity namely the F-statistic and its two different forms: Overall F-Statistic and Partial F-Statistic
- Adjusted regression sum of Squares and Sequential regression sum of squares and their relationship with the overall regression sum of squares
- how to execute three different forms of Hypotheses tests that play an important role in evaluating the model’s utility
learn about
- the details involved in building a multiple linear regression models
- the correct and the incorrect models and the short comings of incorrect models
- two model building methods and focus on one of them namely – Stepwise regression method
- the various steps involved in arriving at a final model using stepwise regression method and apply the same using a real-life example
- certain cautions and points to remember while deploying the step wise regression method
learn about
- the second method namely the Best Subsets method in building the multiple linear regression models
- the application of the objectives criterion to analyze the performance of each of the models from the subset of models
- take a deeper insight into additional statistical outputs to choose the final modal among the competing models
- analyze the residuals in order to conclude if transformations are needed on certain variables and if interaction terms are to be included to further enhance the modal performance
learn about
- a few observations in the sample that can undesirably Influence the model parameters
- how to judge the influence and the two kinds of such observations that can exercise significant influence on the model parameters
- focus in this lecture on one of the two kinds of data – High leverage data – that can qualify as an influential data observation
- how to measure and flag a high leverage observation in the sample that can potentially influence the model parameters
learn about
- standardized or more precisely Studentized residuals as a measure to quantify and flag the outliers
- two kinds of Studentized residuals namely Internally studentized and Externally studentized residuals
- about other statistical measures like DFFITS and Cook’s distance that can be used to identify /flag the influential observation directly bypassing the steps to detect whether they are high leverage data points or outliers
This is big topic it is covered in two parts.
This first part of the video begin with the focus on
- the conceptual understanding on multicollinearity , its impact on statistical parameters of the model and the model behaviour
- learn how correlation amongst the predictor variables directly influence the multicollinearity in the model
- the impact on the various statistical parameters of the model between cases where multicollinearity is absent against where multicollinearity is present
- detailed visualizations and illustrations to get a very good hang on the above concepts and impacts
- on model objectives that decide whether multicollinearity can be allowed to exist in the model or not
In this second part of the video on multicollinearity : learn about
- the statistic Variance inflation factor (VIF) to measure and quantify multicollinearity
- How to derive a general expression to calculate VIF
- two commonly known types (causes) of Multicollinearity namely Data based multicollinearity and structural multicollinearity
- how to deal with each one of the types of multicollinearity in model building
learn
- what is a logistic regression model as against a multiple linear regression model
- composition in terms of predictor and response variables of a logistic regression models
- the maximum likelihood estimation method to arrive at the β coefficients for the model
- model building using hypothesis tests
- various statistical measures for evaluating the logistic regression models
Regression models are supervised machine learning techniques used to predict continuous numerical values. By analyzing relationships between independent variables (features) and a dependent variable (target), they identify trends to forecast future outcomes. Statistical machine learning combines traditional statistical inference with computational algorithms to learn patterns from data, quantify uncertainty, and make predictions.
This course provides an in depth and comprehensive coverage on Multiple Linear Regression models and Logistic Regression and focusses on complete breadth and depth of statistical measures that play a pivot role in carrying out the regression analysis.
The course provides a detailed stepwise understanding beginning with concepts on correlation, all variants of R-Square, least squares line fit, concept of regression and gradually introduces the student to a complete understanding on Simple linear regression models and finally leading the student to a comprehensive understanding on Multiple linear and logistic regression models. This course also ensures that the student understand transformations on non-linear relationships between the predictor and the response variables where necessary so that even non-linear relationships are effectively handled and accommodated within linear models
Regression analysis helps organizations and researchers replace guesswork with data-driven insights. Common use cases include
Forecasting: Predicting future sales, housing prices, or temperatures.
Risk Analysis: Estimating the likelihood of a financial event or assessing credit risks.
Causal Analysis: Determining how changes in one factor (e.g., marketing spend) affect a target outcome (e.g., revenue)