As the name implies, multivariate regression is a technique that estimates a single regression model with more than one outcome variable. When there is more than one predictor variable in a multivariate regression model, the model is a multivariate multiple regression.
Please Note: The purpose of this page is to show how to use various data analysis commands. It does not cover all aspects of the research process which researchers are expected to do. In particular, it does not cover data cleaning and checking, verification of assumptions, model diagnostics and potential follow-up analyses.
Examples of multivariate regression
Example 1. A researcher has collected data on three psychological variables, four academic variables (standardized test scores), and the type of educational program the student is in for 600 high school students. She is interested in how the set of psychological variables is related to the academic variables and the type of program the student is in.
Example 2. A doctor has collected data on cholesterol, blood pressure, and weight. She also collected data on the eating habits of the subjects (e.g., how many ounces of red meat, fish, dairy products, and chocolate consumed per week). She wants to investigate the relationship between the three measures of health and eating habits.
Example 3. A researcher is interested in determining what factors influence the health African Violet plants. She collects data on the average leaf diameter, the mass of the root ball, and the average diameter of the blooms, as well as how long the plant has been in its current container. For predictor variables, she measures several elements in the soil, as well as the amount of light and water each plant receives. Description of the data
Let’s pursue Example 1 from above. We have a hypothetical dataset with 600 observations on seven variables. he psychological variables are locus of control (locus_of_control), self-concept (self_concept), and motivation (motivation). The academic variables are standardized tests scores in reading (read), writing (write), and science (science), as well as a categorical variable (prog) giving the type of program the student is in (general, academic, or vocational).
Let’s look at the data mvreg. Note that there are no missing values in this data set.
descriptives variables = locus_of_control self_concept motivation read write science. Descriptive Statistics N Minimum Maximum Mean Std. Deviation locus_of_control 600 -2.00 2.21 0.0965 0.67028 self_concept 600 -2.53 2.09 0.0049 0.70551 motivation 600 -2.75 2.58 0.0039 0.82240 read 600 24.62 80.59 51.9018 10.10298 write 600 20.07 83.93 52.3848 9.72645 science 600 21.99 80.37 51.7633 9.70618 Valid N (listwise) 600 correlations variables = locus_of_control self_concept motivation. Correlations locus_of_control self_concept motivation locus_of_control Pearson Correlation 1 0.171 0.245 Sig. (2-tailed) <0.001 <0.001 N 600 600 600 self_concept Pearson Correlation 0.171 1 0.289 Sig. (2-tailed) <0.001 <0.001 N 600 600 600 motivation Pearson Correlation 0.245 0.289 1 Sig. (2-tailed) <0.001 <0.001 N 600 600 600 correlations variables = read write science. Correlations read write science read Pearson Correlation 1 0.629 0.691 Sig. (2-tailed) <0.001 <0.001 N 600 600 600 write Pearson Correlation 0.629 1 0.569 Sig. (2-tailed) <0.001 <0.001 N 600 600 600 science Pearson Correlation 0.691 0.569 1 Sig. (2-tailed) <0.001 <0.001 N 600 600 600
Analysis methods you might consider
Below is a list of some analysis methods you may have encountered. Some of the methods listed are quite reasonable while others have either fallen out of favor or have limitations.
Multivariate multiple regression, the focus of this page. Separate OLS Regressions – You could analyze these data using separate OLS regression analyses for each outcome variable. The individual coefficients, as well as their standard errors will be the same as those produced by the multivariate regression. However, the OLS regressions will not produce multivariate results, nor will they allow for testing of coefficients across equations. Canonical correlation analysis might be feasible if you don’t want to consider one set of variables as outcome variables and the other set as predictor variables.
Multivariate regression
To conduct a multivariate regression in SPSS, we can use either of two commands, glm or manova. Using the lmatrix subcommand in the glm command, you can test if all of the equations, taken together, are statistically significant. The F-ratios and p-values for three multivariate criterion are given, including Wilks’ lambda, Lawley-Hotelling trace, Pillai’s trace, and Roy’s largest root. We can get the regression coefficients from either the glm or the manova command by including the print subcommand with the keyword parameter.
Below we run the glm command. Notice that we have multiple dependent variables listed before the SPSS keyword with. (The SPSS keyword with indicates that continuous predictor variables will follow.) We use the lmatrix subcommand to request the test of overall model. Semi-colons are required between each predictor.
glm locus_of_control self_concept motivation with read write science /print = parameters /lmatrix 'multivariate test of entire model' write 1; read 1; science 1. Multivariate Testsa Effect Value F Hypothesis df Error df Sig. Intercept Pillai's Trace 0.165 39.239b 3.000 594.000 <0.001 Wilks' Lambda 0.835 39.239b 3.000 594.000 <0.001 Hotelling's Trace 0.198 39.239b 3.000 594.000 <0.001 Roy's Largest Root 0.198 39.239b 3.000 594.000 <0.001 read Pillai's Trace 0.027 5.529b 3.000 594.000 <0.001 Wilks' Lambda 0.973 5.529b 3.000 594.000 <0.001 Hotelling's Trace 0.028 5.529b 3.000 594.000 <0.001 Roy's Largest Root 0.028 5.529b 3.000 594.000 <0.001 write Pillai's Trace 0.056 11.807b 3.000 594.000 <0.001 Wilks' Lambda 0.944 11.807b 3.000 594.000 <0.001 Hotelling's Trace 0.060 11.807b 3.000 594.000 <0.001 Roy's Largest Root 0.060 11.807b 3.000 594.000 <0.001 science Pillai's Trace 0.017 3.397b 3.000 594.000 0.018 Wilks' Lambda 0.983 3.397b 3.000 594.000 0.018 Hotelling's Trace 0.017 3.397b 3.000 594.000 0.018 Roy's Largest Root 0.017 3.397b 3.000 594.000 0.018 a Design: Intercept + read + write + science b Exact statistic
Tests of Between-Subjects Effects Source Dependent Variable Type III Sum of Squares df Mean Square F Sig. Corrected Model locus_of_control 45.230a 3 15.077 40.135 <0.001 self_concept 1.892b 3 0.631 1.269 0.284 motivation 30.586c 3 10.195 16.224 <0.001 Intercept locus_of_control 37.608 1 37.608 100.116 <0.001 self_concept 0.807 1 0.807 1.624 0.203 motivation 19.477 1 19.477 30.993 <0.001 read locus_of_control 4.802 1 4.802 12.784 <0.001 self_concept 0.257 1 0.257 0.517 0.472 motivation 3.893 1 3.893 6.195 0.013 write locus_of_control 5.391 1 5.391 14.352 <0.001 self_concept 0.351 1 0.351 0.705 0.401 motivation 11.928 1 11.928 18.982 <0.001 science locus_of_control 0.827 1 0.827 2.202 0.138 self_concept 0.623 1 0.623 1.254 0.263 motivation 2.670 1 2.670 4.249 0.040 Error locus_of_control 223.886 596 0.376 self_concept 296.259 596 0.497 motivation 374.542 596 0.628 Total locus_of_control 274.707 600 self_concept 298.165 600 motivation 405.138 600 Corrected Total locus_of_control 269.116 599 self_concept 298.151 599 motivation 405.129 599 a R Squared = 0.168 (Adjusted R Squared = 0.164) b R Squared = 0.006 (Adjusted R Squared = 0.001) c R Squared = 0.075 (Adjusted R Squared = 0.071)
Parameter Estimates Dependent Variable Parameter B Std. Error t Sig. 95% Confidence Interval Lower Bound Upper Bound locus_of_control Intercept -1.555 0.155 -10.006 <0.001 -1.861 -1.250 read 0.013 0.004 3.575 <0.001 0.006 0.021 write 0.013 0.003 3.788 <0.001 0.006 0.020 science 0.005 0.004 1.484 0.138 -0.002 0.013 self_concept Intercept -0.228 0.179 -1.274 0.203 -0.579 0.123 read 0.003 0.004 0.719 0.472 -0.005 0.012 write -0.003 0.004 -0.840 0.401 -0.011 0.004 science 0.005 0.004 1.120 0.263 -0.004 0.013 motivation Intercept -1.119 0.201 -5.567 <0.001 -1.514 -0.724 read 0.012 0.005 2.489 0.013 0.003 0.021 write 0.019 0.004 4.357 <0.001 0.011 0.028 science -0.010 0.005 -2.061 0.040 -0.019 0.000
Contrast Results (K Matrix)a Contrast Dependent Variable locus_of_control self_concept motivation L1 Contrast Estimate 0.013 -0.003 0.019 Hypothesized Value 0 0 0 Difference (Estimate - Hypothesized) 0.013 -0.003 0.019 Std. Error 0.003 0.004 0.004 Sig. <0.001 0.401 <0.001 95% Confidence Interval for Difference Lower Bound 0.006 -0.011 0.011 Upper Bound 0.020 0.004 0.028 L2 Contrast Estimate 0.013 0.003 0.012 Hypothesized Value 0 0 0 Difference (Estimate - Hypothesized) 0.013 0.003 0.012 Std. Error 0.004 0.004 0.005 Sig. <0.001 0.472 0.013 95% Confidence Interval for Difference Lower Bound 0.006 -0.005 0.003 Upper Bound 0.021 0.012 0.021 L3 Contrast Estimate 0.005 0.005 -0.010 Hypothesized Value 0 0 0 Difference (Estimate - Hypothesized) 0.005 0.005 -0.010 Std. Error 0.004 0.004 0.005 Sig. 0.138 0.263 0.040 95% Confidence Interval for Difference Lower Bound -0.002 -0.004 -0.019 Upper Bound 0.013 0.013 0.000 a Based on the user-specified contrast coefficients (L') matrix: multivariate test of entire model
Multivariate Test Results Value F Hypothesis df Error df Sig. Pillai's trace 0.220 15.745 9.000 1788.000 <0.001 Wilks' lambda 0.784 16.856 9.000 1445.791 <0.001 Hotelling's trace 0.269 17.707 9.000 1778.000 <0.001 Roy's largest root 0.244 48.565a 3.000 596.000 <0.001 a The statistic is an upper bound on F that yields a lower bound on the significance level.
Univariate Test Results Source Dependent Variable Sum of Squares df Mean Square F Sig. Contrast locus_of_control 45.230 3 15.077 40.135 <0.001 self_concept 1.892 3 0.631 1.269 0.284 motivation 30.586 3 10.195 16.224 <0.001 Error locus_of_control 223.886 596 0.376 self_concept 296.259 596 0.497 motivation 374.542 596 0.628
The tests for the overall model, shown in the first table above, indicate that the model is statistically significant, regardless of the type of multivariate criteria that is used (i.e., all of the p-values are less than 0.0001). Below the overall model tests are the between-subjects tests. The variables locus_of_control and motivation are statistically significant; the variable self_concept is not. In the third table, we see the parameter estimates for each of the predictor variables for each of the dependent variables. The fifth table gives the multivariate tests, which are all statistically significant, regardless of which test is used. The last table gives the univariate tests, two of which are statistically significant.
The output from the manova command contains most of the same information given by the glm command, but it is organized a little differently. The first table gives the number of observations, number of parameters, RMSE, R-squared, F-ratio, and p-value for each of the three models.
manova locus_of_control self_concept motivation with read write science /print parameters.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - The default error term in MANOVA has been changed from WITHIN CELLS to WITHIN+RESIDUAL. Note that these are the same for all full factorial designs. * * * * * * * * * * * * * * * * * A n a l y s i s o f V a r i a n c e * * * * * * * * * * * * * * * * * 600 cases accepted. 0 cases rejected because of out-of-range factor values. 0 cases rejected because of missing data. 1 non-empty cell. 1 design will be processed. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - * * * * * * * * * * * * * * * * * A n a l y s i s o f V a r i a n c e -- Design 1 * * * * * * * * * * * * * * * * * EFFECT .. WITHIN CELLS Regression Multivariate Tests of Significance (S = 3, M = -1/2, N = 296 ) Test Name Value Approx. F Hypoth. DF Error DF Sig. of F Pillais 0.22030 15.74505 9.00 1788.00 0.000 Hotellings 0.26889 17.70671 9.00 1778.00 0.000 Wilks 0.78439 16.85640 9.00 1445.79 0.000 Roys 0.19644 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - EFFECT .. WITHIN CELLS Regression (Cont.) Univariate F-tests with (3,596) D. F. Variable Sq. Mul. R Adj. R-sq. Hypoth. MS Error MS F Sig. of F locus_of 0.16807 0.16388 15.07664 0.37565 40.13509 0.000 self_con 0.00635 0.00135 0.63077 0.49708 1.26896 0.284 motivati 0.07550 0.07084 10.19548 0.62843 16.22382 0.000 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Regression analysis for WITHIN CELLS error term --- Individual Univariate0.9500 confidence intervals Dependent variable .. locus_of_control COVARIATE B Beta Std. Err. t-Value Sig. of t Lower -95% CL- Upper read 0.0133466809 0.2011716274 0.00373 3.57542 0.000 0.00602 0.02068 write 0.0129191741 0.1874705938 0.00341 3.78845 0.000 0.00622 0.01962 science 0.0054541421 0.0789802632 0.00368 1.48403 0.138 -0.00176 0.01267 Dependent variable .. self_concept COVARIATE B Beta Std. Err. t-Value Sig. of t Lower -95% CL- Upper read 0.0030876697 0.0442156218 0.00429 0.71906 0.472 -0.00535 0.01152 write -0.0032943533 -0.0454171665 0.00392 -0.83980 0.401 -0.01100 0.00441 science 0.0047345432 0.0651360875 0.00423 1.11988 0.263 -0.00357 0.01304 Dependent variable .. motivation COVARIATE B Beta Std. Err. t-Value Sig. of t Lower -95% CL- Upper read 0.0120168331 0.1476238599 0.00483 2.48889 0.013 0.00253 0.02150 write 0.0192165750 0.2272728015 0.00441 4.35678 0.000 0.01055 0.02788 science -0.0097985928 -0.1156455467 0.00475 -2.06130 0.040 -0.01913 -0.00046 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - * * * * * * * * * * * * * * * * * A n a l y s i s o f V a r i a n c e -- Design 1 * * * * * * * * * * * * * * * * * EFFECT .. CONSTANT Multivariate Tests of Significance (S = 1, M = 1/2, N = 296 ) Test Name Value Exact F Hypoth. DF Error DF Sig. of F Pillais 0.16540 39.23945 3.00 594.00 0.000 Hotellings 0.19818 39.23945 3.00 594.00 0.000 Wilks 0.83460 39.23945 3.00 594.00 0.000 Roys 0.16540 Note.. F statistics are exact. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - EFFECT .. CONSTANT (Cont.) Univariate F-tests with (1,596) D. F. Variable Hypoth. SS Error SS Hypoth. MS Error MS F Sig. of F locus_of 37.60832 223.88586 37.60832 0.37565 100.11601 0.000 self_con 0.80711 296.25868 0.80711 0.49708 1.62370 0.203 motivati 19.47692 374.54227 19.47692 0.62843 30.99314 0.000 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Estimates for locus_of_control adjusted for 3 covariates --- Individual univariate 0.9500 confidence intervals CONSTANT Parameter Coeff. Std. Err. t-Value Sig. t Lower -95% CL- Upper 1 -1.5552772264 0.15544 -10.00580 0.00000 -1.86055 -1.25001 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Estimates for self_concept adjusted for 3 covariates --- Individual univariate 0.9500 confidence intervals CONSTANT Parameter Coeff. Std. Err. t-Value Sig. t Lower -95% CL- Upper 1 -0.2278406241 0.17880 -1.27424 0.20307 -0.57900 0.12332 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Estimates for motivation adjusted for 3 covariates --- Individual univariate 0.9500 confidence intervals CONSTANT Parameter Coeff. Std. Err. t-Value Sig. t Lower -95% CL- Upper 1 -1.1192470175 0.20104 -5.56715 0.00000 -1.51409 -0.72440 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Abbreviated Extended Name Name locus_of locus_of_control motivati motivation self_con self_concept
If you ran a separate OLS regression for each outcome variable, you would get exactly the same coefficients, standard errors, t- and p-values, and confidence intervals as shown above. So why conduct a multivariate regression? As we mentioned earlier, one of the advantages of using multivariate regression is that you can conduct tests of the coefficients across the different outcome variables.
Things to consider
The residuals from multivariate regression models are assumed to be multivariate normal. This is analogous to the assumption of normally distributed errors in univariate linear regression (i.e., OLS regression). Multivariate regression analysis is not recommended for small samples. The outcome variables should be at least moderately correlated for the multivariate regression analysis to make sense.