A Practical Introduction to Factor Analysis: Confirmatory Factor Analysis

This page has been updated!

Please refer to Confirmatory Factor Analysis (CFA) in R with lavaan for a much more thorough introduction to CFA.

II. Confirmatory Factor Analysis

Introduction
One factor CFA
1. Identification
2. Model fit
Two factor CFA
1. Uncorrelated Factors
2. Correlated Factors

Back to Launch Page

Introduction

Confirmatory factor analysis borrows many of the same concepts from exploratory factor analysis except that instead of letting the data tell us the factor structure, we pre-determine the factor structure and perform a hypothesis test to see if this is true. In this portion of the seminar, we will continue with the example of the SAQ. However, from the exploratory factor analysis and talking to the Principal Investigator, we decided to remove Item 2 from the analysis. We will call this new survey the SAQ-7. Recall that this model assumes that SPSS Anxiety explains the common variance among all items (in this case seven) in the SAQ-7. In order to use the same file in Mplus you have to convert it to the CSV file version. Download here: saq8.csv

Let’s list the 7 items in the SAQ-7 (Item 2 was deleted and italicized):

I dream that Pearson is attacking me with correlation coefficients
I don’t understand statistics
I have little experience of computers
All computers hate me
I have never been good at mathematics
My friends are better at statistics than me
Computers are useful only for playing games
I did badly at mathematics at school

Recall from our exploratory analysis that Items 1,2,3,4,5, and 8 load onto each other and Items 6 and 7 load onto the same factor. As an exercise, let’s first assume that SPSS Anxiety is the only factor that explains common variance in all 7 items.

1. One Factor Confirmatory Factor Analysis

The most fundamental model in CFA is the one factor model, which will assume that the covariance (or correlation) among items is due to a single common factor. Much like exploratory common factor analysis, we will assume that total variance can be partitioned into common and unique variance.

In Mplus the code is relatively simple, note the BY statement indicates the items to the right of the statement loading onto the factor to the left of the statement.

TITLE: One Factor CFA SAQ-7 (Marker Method)
DATA: FILE IS saq8.csv;
VARIABLE: NAMES ARE q01-q08;
USEVARIABLES q01 q03-q08; 
ANALYSIS: ESTIMATOR = ML; 
MODEL: f1 BY q01 q03-q08;
OUTPUT: STDYX;

Graphically, this is what it looks like:

Model Fit Statistics

The three main model fit indices in CFA are:

Model chi-square this is the chi-square statistic we obtain from the maximum likelihood statistic (similar to the EFA)
CFI is the comparative fit index – values can range between 0 and 1 (values greater than 0.90, conservatively 0.95 indicate good fit)
RMSEA is the root mean square error of approximation (values of 0.01, 0.05 and 0.08 indicate excellent, good and mediocre fit respectively, some go up to 0.10 for mediocre).
- In Mplus, you also obtain a p-value of close fit, that the RMSEA < 0.05. If you reject the model, it means your model is not a close fitting model.

Mplus lists another fit statistic along with the CFI called the TLI Tucker Lewis Index which also ranges between 0 and 1 with values greater than 0.90 indicating good fit. If the CFI and TLI are less than one, the CFI is always greater than the TLI.

In our one factor solution, we see that the chi-square is rejected. This usually happens for large samples (in this case we have N=2571). The RMSEA is 0.100 which indicates mediocre fit. The CFI is 0.906 and the TLI is 0.859, almost but not quite at the threshold of 0.95 and 0.90.

Chi-Square Test of Model Fit

          Value                            376.321
          Degrees of Freedom                    14
          P-Value                           0.0000

RMSEA (Root Mean Square Error Of Approximation)

          Estimate                           0.100
          90 Percent C.I.                    0.092  0.109
          Probability RMSEA <= .05           0.000

CFI/TLI

          CFI                                0.906
          TLI                                0.859

Identification of a CFA model (with at least three items)

In order to identify each factor in a CFA model with at least three indicators, there are two options:

Set the variance of each factor to 1 (variance standardization method)
Set the first loading of each factor to 1 (marker method)

Mplus by default uses Option 2, marker method if nothing else is specified.

TITLE: One Factor CFA Identifying Variance = 1
DATA: FILE IS saq8.csv;
VARIABLE: NAMES ARE q01-q08;
USEVARIABLES q01 q03-q08; 
ANALYSIS: ESTIMATOR = ML; 
MODEL: f1 BY q01* q03-q08;
f1 @1; 
OUTPUT: STDYX;

                                                    Two-Tailed
                    Estimate       S.E.  Est./S.E.    P-Value
 F1       BY
    Q01                0.489      0.017     28.804      0.000
    Q03               -0.594      0.022    -26.953      0.000
    Q04                0.637      0.019     33.875      0.000
    Q05                0.556      0.020     28.218      0.000
    Q06                0.557      0.024     23.274      0.000
    Q07                0.714      0.022     31.809      0.000
    Q08                0.429      0.018     23.529      0.000

 Variances
    F1                 1.000      0.000    999.000    999.000

Below we show the STDYX solution, note that the loadings are different but the variances are the same.

STDYX Standardization
                                                    Two-Tailed
                    Estimate       S.E.  Est./S.E.    P-Value

 F1       BY
    Q01                0.590      0.016     36.337      0.000
    Q03               -0.553      0.017    -33.115      0.000
    Q04                0.672      0.014     46.439      0.000
    Q05                0.576      0.016     35.319      0.000
    Q06                0.497      0.018     27.130      0.000
    Q07                0.648      0.015     41.955      0.000
    Q08                0.491      0.018     27.557      0.000

Variances
    F1                 1.000      0.000    999.000    999.000

The STDYX solution standardizes the loading by the standard deviation of both the predictor (the factor, X) and the outcome (the item, Y). In the variance standardization method above, we only standardize by the predictor (the factor, X). In order to match the STDYX and variance standardization solutions, let’s first get the standard deviation of our outcome q01. Mplus only provides the variance, so we have the square root this to get the standard deviation.

	Variable/         Mean/    
	Sample Size      Variance    
Q01                       2.374      
	   2571.000       0.685

The variance is $0.685$ ; to get the standard deviation we square root to get $\sqrt{0.685} = 0.828$ . In order to get the unstandardized solution to match STDYX, we take the loading and divide it by the standard deviation of q01, $0.489 / 0.828 = 0.590$ .

2. Two Factor Confirmatory Factor Analysis

Although the results from the one-factor CFA suggest that a one factor solution may capture much of the variance in these items, the model fit suggests that this model can be improved. From the exploratory factor analysis, we found that Items 6 and 7 “hang” together. Let’s take a look at Items 6 and 7 more carefully.

Item 6: My friends are better at statistics than me
Item 7: Computers are useful only for playing games

From talking to the Principal Investigator, it appears that these items constitute some sort of attribution bias, so we will name the factor as such.

Uncorrelated factors

We will now proceed with a two-factor CFA where we assume uncorrelated (or orthogonal) factors. Having a two-item factor presents a special problem for identification. In order to identify a two-item factor there are two options:

Freely estimate the loadings of the two items on the same factor but equate them to be equal while setting the variance of the factor at 1
Freely estimate the variance of the factor, using the marker method for the first item, but covary (correlate) the two-item factor with another factor

Since we are doing an uncorrelated two-factor solution here, we are relegated to the first option. One more snag is that Mplus by default correlates factors in a CFA, so you can turn off the correlation by specifying f1 with f2 @ 0. We continue to request the standardized loadings.

TITLE: Two Factor CFA SAQ-7 Factors Uncorrelated (Option 1)
DATA: FILE IS saq8.csv;
VARIABLE: NAMES ARE q01-q08;
USEVARIABLES q01 q03-q08; 
ANALYSIS: ESTIMATOR = ML; 
MODEL: 
f1 BY q01* q03 q04 q05 q08;
f2 BY q06*(1) 
 q07*(1);
f1 with f2 @ 0; 
f1@1;
f2@1; 
OUTPUT: STDYX;

Here’s what the model looks like graphically:

Since we picked Option 1, we set the loadings to be equal to each other:

 
                                                    Two-Tailed
                    Estimate       S.E.  Est./S.E.    P-Value
F2       BY
    Q06                0.797      0.017     46.329      0.000
    Q07                0.797      0.017     46.329      0.000

We know the factors are uncorrelated because under MODEL RESULTS we see that F1 WITH F2 is estimated at zero, which is what we expect.

                                                    Two-Tailed
                    Estimate       S.E.  Est./S.E.    P-Value
 F1       WITH
    F2                 0.000      0.000    999.000    999.000

Looking at the MODEL FIT INFORMATION we see:

Chi-Square Test of Model Fit

          Value                            841.205
          Degrees of Freedom                    15
          P-Value                           0.0000

RMSEA (Root Mean Square Error Of Approximation)

          Estimate                           0.146
          90 Percent C.I.                    0.138  0.155
          Probability RMSEA <= .05           0.000

CFI/TLI

          CFI                                0.786
          TLI                                0.700

We can see that the uncorrelated two factor CFA solution gives us a higher chi-square (lower is better), higher RMSEA and lower CFI/TLI, which means overall it’s a poorer fitting model. We talk to the Principal Investigator and decide to go with a correlated (oblique) two factor model.

Correlated factors

We proceed with a correlated two-factor CFA. We still have the issue of that two-item factor; recall that for identification we can either equate the loadings and set the variance to 1 or we can covary the two-item factor with another factor and use the marker method. Taking advantage of our correlated factors, let’s use the second option. Looking at the code, it looks like everything is default in Mplus since Mplus automatically uses the marker method for both factors and automatically correlates the factors in a CFA.

TITLE: Two Factor CFA SAQ-7 Factors Correlated (Option 2)
DATA: FILE IS saq8.csv;
VARIABLE: NAMES ARE q01-q08;
USEVARIABLES q01 q03-q08; 
ANALYSIS: ESTIMATOR = ML; 
MODEL: 
f1 BY q01 q03 q04 q05 q08;
f2 BY q06 q07;
OUTPUT: STDYX;

We use the marker method (setting the loading of the first item to 1) and freely estimate the variance. Notice that unlike Option 1, the first loading is not equal to the second loading.

 
                                                    Two-Tailed
                    Estimate       S.E.  Est./S.E.    P-Value
F2       BY
    Q06                1.000      0.000    999.000    999.000
    Q07                1.419      0.071     20.051      0.000

The marker method (Option 2) allows us to freely estimate the variances,

 
                                                    Two-Tailed
                    Estimate       S.E.  Est./S.E.    P-Value
Variances
    F1                 0.263      0.017     15.187      0.000
    F2                 0.447      0.033     13.496      0.000

but since we chose Option 2, we can covary (correlate) the two-item factor (Attribution Bias) with the five-item factor (SPSS anxiety), so we see that the covariance between the two factors is not zero.

 
                                                    Two-Tailed
                    Estimate       S.E.  Est./S.E.    P-Value
F2       WITH
    F1                 0.232      0.015     15.311      0.000

Let’s take a look at the MODEL FIT INFORMATION

Chi-Square Test of Model Fit

          Value                             66.768
          Degrees of Freedom                    13
          P-Value                           0.0000

RMSEA (Root Mean Square Error Of Approximation)

          Estimate                           0.040
          90 Percent C.I.                    0.031  0.050
          Probability RMSEA <= .05           0.952

CFI/TLI

          CFI                                0.986
          TLI                                0.977

Notice that compared to the uncorrelated two-factor solution, the chi-square and RMSEA are both lower. The test of RMSEA is not significant which means that we do not reject the null hypothesis that the RMSEA is less than or equal to 0.05. Additionally the CFI and TLI are both higher and pass the 0.95 threshold. This is even better fitting than the one-factor solution. After talking with the Principal Investigator, we choose the final two correlated factor CFA model as shown below.