Linear Regression Analysis | Stata Data Analysis Examples

Linear regression, also called OLS (ordinary least squares) regression, is used to model continuous outcome variables. In the OLS regression model, the outcome is modeled as a linear combination of the predictor variables.

Please note: The purpose of this page is to show how to use various data analysis commands. It does not cover all aspects of the research process which researchers are expected to do. In particular, it does not cover data cleaning and checking, verification of assumptions, model diagnostics and potential follow-up analyses.

Examples of linear regression

Example 1: A researcher is interested in how scores on a math and a science test are associated with scores on a writing test. The outcome variable is the score on the writing test.

Example 2: A research team is interested in motivating people to eat more vegetables by showing subjects videos of simple ways to prepare vegetables for dinner. The outcome variable is the number of ounces of vegetables consumed for dinner for one week.

Example 3: Researchers are interested in the effect of light on sleep quality. They randomly assign subjects to different light conditions and measure sleep quality for one month. The average seep quality score is the outcome variable.

Description of the data

For our data analysis below, we are going to expand on Example 1 about the association between test scores. We have generated hypothetical data, which can be obtained from our website.

use https://stats.idre.ucla.edu/stat/stata/notes/hsb2, clear
(highschool and beyond (200 cases))

We can obtain descriptive statistics for each of the variables that we will use in our linear regression model. Although the variable female is binary (coded 0 and 1), we can still use it in the summarize command.

summarize science math female socst read

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
     science |        200       51.85    9.900891         26         74
        math |        200      52.645    9.368448         33         75
      female |        200        .545    .4992205          0          1
       socst |        200      52.405    10.73579         26         71
        read |        200       52.23    10.25294         28         76

We can use the tabulate command to see the number of males and females.

tabulate female

     female |      Freq.     Percent        Cum.
------------+-----------------------------------
       male |         91       45.50       45.50
     female |        109       54.50      100.00
------------+-----------------------------------
      Total |        200      100.00

Analysis methods you might consider

Below is a list of some analysis methods you may have encountered.

- Linear regression, the focus of this page.
- ANCOVA: ANCOVA will give the same results as linear regression, except with a different parameterization. Linear regression will use dummy coding for categorical predictors, while ANCOVA will use effect coding.
- Robust regression: Robust regression is a type of linear regression used when the assumption of homogeneity of variance may be violated.