Statistical Methods and Data Analytics

Why do we care about linear regression?

Linear regression allows us to:

  • understand and quantify the relationships between an outcome and one or more predictors
  • estimate the unique effect of a predictor while accounting for the influence of other variables
  • predict outcomes for future observations
  • understand other regression models, which build on the linear regression model

The typical goal is to estimate population parameters describing these relationships, as well as their uncertainties, from a sample.

Simple linear regression model

In a simple linear regression model, we quantify the relationship between one outcome and one predictor.

##            x           y
## 1  0.9819694  0.49317417
## 2  0.4687150  0.70972185
## 3 -0.1079713 -0.83172816
## 4 -0.2128782  0.09893859
## 5  1.1580985  1.60159397
## 6  1.2923548 -0.36389939

x: predictor, independent variable (IV), regressor

y: outcome, dependent variable (DV), response

Linear regression assumes linear relationship between x and y.

The simple linear regression model:

\[y = \beta_0 + \beta_1 x + \epsilon\]

  • \(\beta_0\) is the intercept , the predicted \(y\) value when \(x=0\)
  • \(\beta_1\) is the slope , the predicted change in \(y\) for a one-unit increase in \(x\)
  • \(\epsilon\) is the error term , the deviation of the predicted \(y\) value from the observed \(y\)

Ordinary Least Squares (OLS)

The Ordinary Least Squares (OLS) estimator (formula or method) finds the best-fitting line relating \(x\) to \(y\).

The best-fit line minimizes the sum of the squared errors, or differences between observed and predicted values of \(y\).