This page shows an example of multivariate analysis of variance (MANOVA) in SAS with footnotes explaining the output. The data used in this example are from the following experiment.
A researcher randomly assigns 33 subjects to one of three groups. The first group receives technical dietary information interactively from an on-line website. Group 2 receives the same information from a nurse practitioner, while group 3 receives the information from a video tape made by the same nurse practitioner. Each subject then made three ratings: difficulty, usefulness, and importance of the information in the presentation. The researcher looks at three different ratings of the presentation (difficulty, usefulness and importance) to determine if there is a difference in the modes of presentation. In particular, the researcher is interested in whether the interactive website is superior because that is the most cost-effective way of delivering the information. In the dataset, the ratings are presented in the variables useful, difficulty and importance. The variable group indicates the group to which a subject was assigned.
We are interested in how the variability in the three ratings can be explained by a subject’s group. Group is a categorical variable with three possible values: 1, 2 or 3. Because we have multiple dependent variables that cannot be combined, we will choose to use MANOVA. Our null hypothesis in this analysis is that a subject’s group has no effect on any of the three different ratings, and we can test this hypothesis on the dataset, manova.sas7bdat .
We can start by examining the three outcome variables.
data manova; set "C:\tempmanova"; run;
proc means data = manova; var useful difficulty importance; run;
The MEANS Procedure Variable N Mean Std Dev Minimum Maximum USEFUL 33 16.3303030 3.2924615 11.8999996 24.2999992 DIFFICULTY 33 5.7151515 2.0175978 2.4000001 10.2500000 IMPORTANCE 33 6.4757576 3.9851309 0.2000000 18.7999992
proc freq data = manova; table group; run;
The FREQ Procedure Cumulative Cumulative GROUP Frequency Percent Frequency Percent 1 11 33.33 11 33.33 2 11 33.33 22 66.67 3 11 33.33 33 100.00
proc sort data = manova; by group; run; proc means data = manova; by group; var useful difficulty importance; run;
GROUP=1 The MEANS Procedure Variable N Mean Std Dev Minimum Maximum USEFUL 11 18.1181817 3.9037974 13.0000000 24.2999992 DIFFICULTY 11 6.1909091 1.8997129 3.7500000 10.2500000 IMPORTANCE 11 8.6818181 4.8630890 3.3000000 18.7999992 GROUP=2 Variable N Mean Std Dev Minimum Maximum USEFUL 11 15.5272729 2.0756162 12.8000002 19.7000008 DIFFICULTY 11 5.5818183 2.4342631 2.4000001 9.8500004 IMPORTANCE 11 5.1090909 2.5311873 0.2000000 8.5000000 GROUP=3 Variable N Mean Std Dev Minimum Maximum USEFUL 11 15.3454545 3.1382682 11.8999996 19.7999992 DIFFICULTY 11 5.3727273 1.7590287 2.6500001 8.7500000 IMPORTANCE 11 5.6363637 3.5469065 0.7000000 10.3000002
Next, we can enter our MANOVA command. In SAS, MANOVA is an option within proc glm, the generalized linear model procedure. We use the class statement to indicate our categorical predictor variable group, then specify our model by listing our outcome variables to the left of the equal sign and our predictor to the right. We are only interested in type III sum of squares, which we indicate with the SS3 option. In the manova statement, we indicate that our hypothesized effect, represented in SAS as h, is group.
proc glm data = manova; class group; model useful difficulty importance = group / SS3; manova h = group; run;
The GLM Procedure Class Level Information Class Levels Values GROUP 3 1 2 3 Number of Observations Read 33 Number of Observations Used 33
Dependent Variable: USEFUL Sum of Source DF Squares Mean Square F Value Pr > F Model 2 52.9242378 26.4621189 2.70 0.0835 Error 30 293.9654425 9.7988481 Corrected Total 32 346.8896803 R-Square Coeff Var Root MSE USEFUL Mean 0.152568 19.16873 3.130311 16.33030 Source DF Type III SS Mean Square F Value Pr > F GROUP 2 52.92423783 26.46211891 2.70 0.0835
Dependent Variable: DIFFICULTY Sum of Source DF Squares Mean Square F Value Pr > F Model 2 3.9751512 1.9875756 0.47 0.6282 Error 30 126.2872767 4.2095759 Corrected Total 32 130.2624279 R-Square Coeff Var Root MSE DIFFICULTY Mean 0.030516 35.89975 2.051725 5.715152 Source DF Type III SS Mean Square F Value Pr > F GROUP 2 3.97515121 1.98757560 0.47 0.6282
Dependent Variable: IMPORTANCE Sum of Source DF Squares Mean Square F Value Pr > F Model 2 81.8296936 40.9148468 2.88 0.0718 Error 30 426.3708962 14.2123632 Corrected Total 32 508.2005898 R-Square Coeff Var Root MSE IMPORTANCE Mean 0.161018 58.21603 3.769929 6.475758 Source DF Type III SS Mean Square F Value Pr > F GROUP 2 81.82969356 40.91484678 2.88 0.0718
Multivariate Analysis of Variance Characteristic Roots and Vectors of: E Inverse * H, where H = Type III SSCP Matrix for GROUP E = Error SSCP Matrix Characteristic Characteristic Vector V'EV=1 Root Percent USEFUL DIFFICULTY IMPORTANCE 0.89198790 99.42 0.06410227 -0.00186162 0.05375069 0.00524207 0.58 0.01442655 0.06888878 -0.02620577 0.00000000 0.00 -0.03149580 0.05943387 0.01270798 MANOVA Test Criteria and F Approximations for the Hypothesis of No Overall GROUP Effect H = Type III SSCP Matrix for GROUP E = Error SSCP Matrix S=2 M=0 N=13 Statistic Value F Value Num DF Den DF Pr > F Wilks' Lambda 0.52578838 3.54 6 56 0.0049 Pillai's Trace 0.47667013 3.02 6 58 0.0122 Hotelling-Lawley Trace 0.89722998 4.12 6 35.61 0.0031 Roy's Greatest Root 0.89198790 8.62 3 29 0.0003 NOTE: F Statistic for Roy's Greatest Root is an upper bound. NOTE: F Statistic for Wilks' Lambda is exact.
Class Level Information
The GLM Procedure Class Level Information Classa Levelsb Valuesc GROUP 3 1 2 3 Number of Observations Read 33 Number of Observations Used 33
a. Class – This is the categorical predictor variable in the MANOVA.
b. Levels – This is the number of possible values of the specified predictor. Our predictor in this example has three levels (group = 1, group = 2 and group = 3).
c. Values – These are the values of the predictor.
Univariate Outputd
Dependent Variablee: USEFUL Sum of Sourcef DFg Squaresh Mean Squarei F Valuej Pr > Fk Model 2 52.9242378 26.4621189 2.70 0.0835 Error 30 293.9654425 9.7988481 Corrected Total 32 346.8896803 R-Squarel Coeff Varm Root MSEn USEFUL Meano 0.152568 19.16873 3.130311 16.33030 Source DF Type III SSp Mean Square F Value Pr > F GROUP 2 52.92423783 26.46211891 2.70 0.0835
d. Univariate Output – Within MANOVA, SAS provides both univariate and multivariate output. The univariate results are presented separately for each dependent variable. Here, we see the univariate output for useful (the univariate output for difficulty and importance have been excluded to increase readability). Within each set of output for a dependent variable, there are two sets of results. The first set of results matches a one-way ANOVA using the MANOVA predictor and the single dependent variable. The second set of results presents the type III sum of squares results.
e. Dependent Variable – This is one of the dependent variables from the MANOVA.
f. Source – This is the source of the variability in the specified dependent variable.
g. DF – This is the degrees of freedom. Because our predictor, group, has 3 levels, the degrees of freedom associated with the model is 2.
h. Sum of Squares – These are the model, error, and total sum of squares. The model sum of squares is the sum of the squared differences between the predicted values and the mean of the outcome variable. The error sum of squares is the sum of the squared differences between the predicted values and the outcome values. The total sum of squares is the sum of the model and error sums of squares.
i. Mean Square – This is the sum of squares divided by the degrees of freedom (see g and h).
j. F Value – This is the F statistic associated with the given source.
k. Pr > F – This is the p-value associated with the F statistic of a given source. The null hypothesis that the predictor has no effect on the outcome variable is evaluated with regard to this p-value. For a given alpha level, if the p-value is less than alpha, the null hypothesis is rejected. If not, then we fail to reject the null hypothesis.
l. R-Square – This is the proportion of variability in the dependent variable (useful) that can be explained by the model. It is the ratio of the model sum of squares to the total sum of squares.
m. Coeff Var – This is the coefficient of variation expressed as a percent. The proportion can be calculated as the ratio of the root mean squared error to the mean of the outcome variable (see n and o), expressed as a percent. It describes the amount of variation in the outcome variable.
n. Root MSE – This is the square root of the Mean Square.
o. USEFUL mean – This is the mean value of the dependent variable.
p. Type III SS – This is a type of sum-of-squares calculation. Here, we are looking at the sum of squares of the predictor, group. Because our model consists of just one predictor, the sum of squares of the predictor is the same as the model sum of squares. Type III sum of squares are calculated for each predictor as if it is the last predictor added to the model. However, in this example, we only have one predictor, and we can see that the Type III sum of squares matches the sum of squares from the ANOVA.
MANOVA Output
Multivariate Analysis of Variance Characteristic Roots and Vectors of: E Inverse * H, where H = Type III SSCP Matrix for GROUP E = Error SSCP Matrix Characteristic Characteristic Vectorr V'EV=1 Rootq Percent USEFUL DIFFICULTY IMPORTANCE 0.89198790 99.42 0.06410227 -0.00186162 0.05375069 0.00524207 0.58 0.01442655 0.06888878 -0.02620577 0.00000000 0.00 -0.03149580 0.05943387 0.01270798 MANOVA Test Criteria and F Approximations for the Hypothesis of No Overall GROUP Effect H = Type III SSCP Matrix for GROUP E = Error SSCP Matrix S=2 M=0 N=13s Statistict Value F Valuey Num DFz Den DFaa Pr > Fab Wilks' Lambdau 0.52578838 3.54 6 56 0.0049 Pillai's Tracev 0.47667013 3.02 6 58 0.0122 Hotelling-Lawley Tracew 0.89722998 4.12 6 35.61 0.0031 Roy's Greatest Rootx 0.89198790 8.62 3 29 0.0003 NOTE: F Statistic for Roy's Greatest Root is an upper bound. NOTE: F Statistic for Wilks' Lambda is exact.
q. Characteristic Root – These are the eigenvalues of the product of the sum-of-squares matrix of the model and the sum-of-squares matrix of the error. There is one eigenvalue for each of the eigenvectors of the product of the model sum of squares matrix and the error sum of squares matrix, a 3×3 matrix. The percents listed next to the characteristic roots indicate the amount of variability in the outcomes a given root and vector account for. In this example, the first root and vector account for 99.42% of the variability in the outcomes and the second for .58% of the variability in the outcomes.
r. Characteristic Vector – These are the eigenvectors of the product of the sum-of-squares matrix of the model and the sum-of-squares matrix of the error. The three numbers that compose a vector can be read across a row (one under useful, one under difficulty, and one under importance).
s. S=2 M=0 N=13 – These are intermediate results that are used in computing the multivariate test statistics and their associated degrees of freedom. If P is the number of dependent variables, Q is the hypothesis degrees of freedom, and NE is the residual or error degrees of freedom, then S = min(P, Q), M = .5(abs(P-Q)-1) and N = .5(NE-P-1).
t. Statistic – MANOVA calculates four multivariate test statistics. All four are based on the characteristic roots (see superscript q). The null hypothesis for each of these tests is the same: the independent variable (group) has no effect on any of the dependent variables (useful, difficulty and importance).
u. Wilks’ Lambda – This can be interpreted as the proportion of the variance in the outcomes that is not explained by an effect. To calculate Wilks’ Lambda, for each characteristic root, calculate 1/(1 + the characteristic root), then find the product of these ratios. So in this example, you would first calculate 1/(1+0.89198790) = 0.5285446, 1/(1+0.00524207) = 0.9947853, and 1/(1+0)=1. Then multiply 0.5285446 * 0.9947853 * 1 = 0.52578838.
v. Pillai’s Trace – This is another one of the four multivariate test statistics used in MANOVA. To calculate Pillai’s trace, divide each characteristic root by 1 + the characteristic root, then sum these ratios. So in this example, you would first calculate 0.89198790/(1+0.89198790) = 0.471455394, 0.00524207/(1+0.00524207) = 0.005214734, and 0/(1+0)=0. When these are added we arrive at Pillai’s trace: (0.471455394 + 0.005214734 + 0) = 0.47667013.
w. Hotelling-Lawley Trace – This is very similar to Pillai’s Trace. It is the sum of the roots of the product of the sum-of-squares matrix of the model and the sum-of-squares matrix of the error for the two linear regression functions and is a direct generalization of the F statistic in ANOVA. We can calculate the Hotelling-Lawley Trace by summing the characteristic roots listed in the output: 0.89198790 + 0.00524207 + 0 = 0.89723.
x. Roy’s Greatest Root – This is the largest of the roots of the product of the sum-of-squares matrix of the model and the sum-of-squares matrix of the error for the two linear regression functions. We can see that the value of Roy’s Greatest Root is the largest of the characteristic roots (see superscript q). Because it is a maximum, it can behave differently from the other three test statistics. In instances where the other three are not significant and Roy’s is significant, the effect should be considered non-significant. For further information on the calculations underlying MANOVA results, consult SAS online documentation .
y. F Value – This is the F statistic for the given predictor and test statistic.
z. Num DF – This is the number of degrees of freedom in the model.
aa. Den DF – This is the number of degrees of freedom associated with the model errors. Note that there are instances in MANOVA when the degrees of freedom may be a non-integer (here, the DF associated with Hotelling-Lawley Trace is a non-integer) because these degrees of freedom are calculated using the mean squared errors, which are often non-integers.
ab. Pr > F – This is the p-value associated with the F statistic of a given effect and test statistic. The null hypothesis that a given predictor has no effect on either of the outcomes is evaluated with regard to this p-value. For a given alpha level, if the p-value is less than alpha, the null hypothesis is rejected. If not, then we fail to reject the null hypothesis. In this example, we reject the null hypothesis that group has no effect on useful, difficulty or importance scores at alpha level .05 because the p-values are all less than .05.