Trying to run factor analysis with missing data can be problematic. One issue is that traditional multiple imputation methods, such as mi estimate, don’t work with Stata’s factor command. Truxillo (2005) , Graham (2009), and Weaver and Maxwell (2014) have suggested an approach using maximum likelihood with the expectation-maximization (EM) algorithm to estimate of the covariance matrix. Stata’s mi command computes an EM covariance matrix as part of the imputation process. We will demonstrate how to use this EM covariance matrix to obtain a factor solution.
To begin, we will load a Stata dataset fa_missing, get some descriptive statistics and compute the complete case covariance matrix.
use https://stats.idre.ucla.edu/stat/data/fa_missing, clear summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- item13 | 1419 4.450317 .7374944 1 5 item14 | 1428 4.518207 .7086049 1 5 item15 | 1424 4.434691 .7478835 1 5 item16 | 1420 4.270423 .8387034 1 5 item17 | 1423 4.158819 .8969815 1 5 -------------+-------------------------------------------------------- item18 | 1424 3.924157 1.032095 1 5 item19 | 1420 4.072535 .9665034 1 5 item20 | 1396 3.770774 .9137137 1 5 item21 | 1422 3.769339 .9863042 1 5 item22 | 1414 3.592645 1.122807 1 5 -------------+-------------------------------------------------------- item23 | 1423 3.800422 .9639492 1 5 item24 | 1417 3.653493 .9308223 1 5 item25 | 1398 2.285408 .9892487 1 5 item26 | 1414 2.077086 1.058313 1 5 item27 | 1420 1.496479 .7294192 1 5 -------------+-------------------------------------------------------- item28 | 1419 2.273432 .9677116 1 5 count /* count total number of observations */ 1428 corr, cov /* complete case covariance matrix */ (obs=1331) | item13 item14 item15 item16 item17 item18 item19 item20 item21 item22 -------------+------------------------------------------------------------------------------------------ item13 | .536077 item14 | .338379 .48707 item15 | .321189 .321527 .536102 item16 | .345401 .289728 .305775 .681157 item17 | .376869 .34434 .382685 .435569 .799719 item18 | .310446 .313639 .348313 .350631 .52091 1.07943 item19 | .202336 .211924 .250935 .262903 .387361 .6311 .935065 item20 | .202417 .199169 .234828 .235136 .33403 .490213 .39099 .82419 item21 | .340947 .302693 .364249 .369725 .525901 .565762 .472584 .377907 .966934 item22 | .271303 .257165 .30236 .328247 .450612 .625471 .521898 .389491 .553577 1.24822 item23 | .39691 .374938 .405833 .35907 .526654 .567222 .406462 .352182 .558713 .52722 item24 | .305477 .28148 .290924 .325099 .432752 .457068 .332405 .30078 .449396 .456496 item25 | .008449 -.011696 -.038745 -.030012 -.039108 -.042074 -.064693 -.026436 -.025697 -.047378 item26 | .014954 -.024045 -.002687 -.019264 -.021647 -.018859 .018107 -.026555 .002384 -.019735 item27 | -.036163 -.045486 -.046055 -.065249 -.055178 -.070832 -.053228 -.036927 -.062904 -.099815 item28 | -.000554 -.013315 -.033624 -.048267 -.028426 -.051824 -.016597 -.044399 -.031681 -.07906 | item23 item24 item25 item26 item27 item28 -------------+------------------------------------------------------ item23 | .913566 item24 | .618358 .848286 item25 | -.031721 -.043576 .976103 item26 | .014638 -.025494 .10275 1.10263 item27 | -.059988 -.063666 .123452 .170048 .51931 item28 | -.004233 -.049099 .23827 .210952 .353081 .941695
From the output above, you can see that there are a total of 1,428 observations with 1,365 complete cases. All of the variables have missing cases except for item14. item20 has the most missing data with only 1,396 nonmissing cases.
We will use the mlong format for mi set but this approach will work with any of the mi data formats. When you register variables to be imputed (mi register imputed) you should also include the variables without missing values, such as item14, so that they will be included in the EM covariance matrix. Next, run the mi impute mvn command with the emonly option. Notice that there are no variables to the right of the equal sign. In fact, there is no equal sign at all.
After running mi impute, the EM covariance matrix can be found in the saved results in r(Sigma_em) which we will then save to the matrix cov_em for use in factormat.
mi set mlong mi register imputed item13-item28 (97 m=0 obs. now marked as incomplete) mi impute mvn item13-item28, emonly note: variable item14 contains no soft missing (.) values; imputing nothing Iteration 0: Observed log likelihood = -9021.7844 Iteration 1: Observed log likelihood = -4116.7934 Iteration 2: Observed log likelihood = -4113.8728 Iteration 3: Observed log likelihood = -4113.8685 Iteration 4: Observed log likelihood = -4113.8685 Iteration 5: Observed log likelihood = -4113.8685 Expectation-maximization estimation Number obs = 1428 Number missing = 167 Number patterns = 34 Prior: uniform Obs per pattern: min = 1 avg = 42 max = 1331 Observed log likelihood = -4113.8685 at iteration 5 ------------------------------------------------------------------------------------------------------ | item13 item14 item15 item16 item17 item18 item19 item20 -------------+---------------------------------------------------------------------------------------- Coef | _cons | 4.451285 4.518207 4.435308 4.268804 4.156375 3.922213 4.070152 3.767296 -------------+---------------------------------------------------------------------------------------- Sigma | item13 | .5430297 .348556 .3350714 .3450292 .3822566 .3104201 .2059764 .2103791 item14 | .348556 .5017693 .3455828 .3003632 .355927 .3212322 .2256632 .2100503 item15 | .3350714 .3455828 .5584064 .3186633 .400073 .3582909 .2667201 .2515707 item16 | .3450292 .3003632 .3186633 .705225 .4391928 .3453138 .2643872 .2404926 item17 | .3822566 .355927 .400073 .4391928 .8085157 .5141493 .3895897 .3465985 item18 | .3104201 .3212322 .3582909 .3453138 .5141493 1.068469 .6312669 .4935741 item19 | .2059764 .2256632 .2667201 .2643872 .3895897 .6312669 .9371905 .4000026 item20 | .2103791 .2100503 .2515707 .2404926 .3465985 .4935741 .4000026 .8360225 item21 | .3460404 .3144694 .3764157 .3618431 .5214286 .5644802 .4791261 .3844396 item22 | .2817349 .2724658 .3191765 .3361343 .4594059 .6214813 .5312196 .3984811 item23 | .4075676 .3925719 .4292744 .3747293 .5356915 .567577 .4185394 .3681236 item24 | .3165909 .3003985 .3143 .334972 .4410231 .4564384 .3410914 .3120366 item25 | -.004998 -.0312026 -.0558893 -.0478347 -.049516 -.0489516 -.0731514 -.0307672 item26 | .0131501 -.0212114 .0033069 -.0091686 -.0192239 -.0085362 .0180015 -.0241016 item27 | -.0416338 -.054581 -.0551466 -.0723354 -.0615237 -.0827423 -.0596916 -.0426282 item28 | -.0053221 -.0246628 -.0425885 -.0575029 -.0363375 -.0585845 -.0250774 -.0471011 ------------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------------ | item21 item22 item23 item24 item25 item26 item27 item28 -------------+---------------------------------------------------------------------------------------- Coef | _cons | 3.770048 3.593149 3.79814 3.655047 2.285293 2.077139 1.49686 2.273121 -------------+---------------------------------------------------------------------------------------- Sigma | item13 | .3460404 .2817349 .4075676 .3165909 -.004998 .0131501 -.0416338 -.0053221 item14 | .3144694 .2724658 .3925719 .3003985 -.0312026 -.0212114 -.054581 -.0246628 item15 | .3764157 .3191765 .4292744 .3143 -.0558893 .0033069 -.0551466 -.0425885 item16 | .3618431 .3361343 .3747293 .334972 -.0478347 -.0091686 -.0723354 -.0575029 item17 | .5214286 .4594059 .5356915 .4410231 -.049516 -.0192239 -.0615237 -.0363375 item18 | .5644802 .6214813 .567577 .4564384 -.0489516 -.0085362 -.0827423 -.0585845 item19 | .4791261 .5312196 .4185394 .3410914 -.0731514 .0180015 -.0596916 -.0250774 item20 | .3844396 .3984811 .3681236 .3120366 -.0307672 -.0241016 -.0426282 -.0471011 item21 | .970728 .5586832 .5705437 .4616899 -.0378778 .0058816 -.0728921 -.0373168 item22 | .5586832 1.261583 .5445269 .4737303 -.0510983 -.0252185 -.0972336 -.0784298 item23 | .5705437 .5445269 .9342235 .6368121 -.0503701 .0146983 -.066706 -.009803 item24 | .4616899 .4737303 .6368121 .8657582 -.0596746 -.0287266 -.0696632 -.0554101 item25 | -.0378778 -.0510983 -.0503701 -.0596746 .9778334 .0906248 .1282772 .2393721 item26 | .0058816 -.0252185 .0146983 -.0287266 .0906248 1.118956 .1671461 .196613 item27 | -.0728921 -.0972336 -.066706 -.0696632 .1282772 .1671461 .5314816 .3568273 item28 | -.0373168 -.0784298 -.009803 -.0554101 .2393721 .196613 .3568273 .9353024 ------------------------------------------------------------------------------------------------------ Note: no imputation performed. matrix cov_em = r(Sigma_em) matrix list cov_em symmetric cov_em[16,16] item13 item14 item15 item16 item17 item18 item19 item20 item13 .54302971 item14 .348556 .50176934 item15 .33507137 .34558277 .55840641 item16 .34502917 .3003632 .31866333 .705225 item17 .38225661 .35592696 .40007302 .43919281 .80851571 item18 .31042005 .32123222 .35829089 .34531382 .51414933 1.0684692 item19 .20597637 .22566319 .26672014 .26438724 .38958966 .63126692 .93719047 item20 .21037912 .21005025 .25157069 .24049261 .34659848 .49357406 .40000261 .83602251 item21 .34604038 .31446937 .37641575 .3618431 .52142858 .56448016 .47912613 .38443958 item22 .28173489 .27246583 .31917653 .33613426 .45940591 .62148129 .53121959 .39848113 item23 .40756762 .39257192 .42927439 .37472927 .53569152 .56757702 .41853944 .36812361 item24 .3165909 .30039846 .3143 .33497204 .4410231 .45643844 .34109143 .31203658 item25 -.00499796 -.03120263 -.05588928 -.04783472 -.04951603 -.04895159 -.07315143 -.03076718 item26 .01315008 -.02121138 .00330686 -.0091686 -.0192239 -.00853618 .01800149 -.02410163 item27 -.04163384 -.054581 -.05514657 -.07233544 -.06152372 -.08274233 -.05969162 -.04262815 item28 -.00532215 -.02466275 -.0425885 -.05750295 -.03633748 -.05858451 -.02507744 -.04710109 item21 item22 item23 item24 item25 item26 item27 item28 item21 .97072797 item22 .55868325 1.2615831 item23 .57054369 .54452686 .93422349 item24 .46168988 .4737303 .63681207 .86575819 item25 -.03787779 -.05109826 -.05037005 -.05967465 .97783345 item26 .00588162 -.02521849 .01469832 -.02872665 .09062481 1.1189565 item27 -.07289205 -.09723358 -.06670603 -.0696632 .1282772 .16714608 .53148159 item28 -.03731678 -.07842976 -.00980302 -.05541012 .23937206 .19661302 .35682735 .93530244
We will use the factormat command with the EM estimate of the covariance matrix to obtain our factor solution. The factormat is for use with a correlation or covariance matrix. The command requires that the sample size, n, be entered along with the name of the covariance matrix. In her paper, Truxillo discusses three methods for specifying nominal sample size, 1) column-wise minimum, 2) column-wise average and 3) pairwise minimum. Column-wise minimum is just the number of complete cases for the variables with the most missing values which is the value we will use for this example. If you will recall from above that value is 1,396.
factormat cov_em, n(1396) fact(4) ml (obs=1396) Iteration 0: log likelihood = -236.78484 Iteration 1: log likelihood = -85.766521 (...omitted...) Iteration 90: log likelihood = -85.345691 Iteration 91: log likelihood = -85.345691 Factor analysis/correlation Number of obs = 1396 Method: maximum likelihood Retained factors = 4 Rotation: (unrotated) Number of params = 58 Schwarz's BIC = 590.691 Log likelihood = -85.34569 (Akaike's) AIC = 286.691 -------------------------------------------------------------------------- Factor | Eigenvalue Difference Proportion Cumulative -------------+------------------------------------------------------------ Factor1 | 5.83062 4.66732 0.7030 0.7030 Factor2 | 1.16329 0.34933 0.1403 0.8432 Factor3 | 0.81396 0.32778 0.0981 0.9414 Factor4 | 0.48619 . 0.0586 1.0000 -------------------------------------------------------------------------- LR test: independent vs. saturated: chi2(120) = 9652.01 Prob>chi2 = 0.0000 LR test: 4 factors vs. saturated: chi2(62) = 169.61 Prob>chi2 = 0.0000 Factor loadings (pattern matrix) and unique variances --------------------------------------------------------------------- Variable | Factor1 Factor2 Factor3 Factor4 | Uniqueness -------------+----------------------------------------+-------------- item13 | 0.7069 0.0779 -0.3640 0.1850 | 0.3274 item14 | 0.7087 0.0258 -0.3268 0.1776 | 0.3588 item15 | 0.7260 -0.0043 -0.2333 0.1634 | 0.3919 item16 | 0.6158 -0.0577 -0.2016 0.2151 | 0.5306 item17 | 0.7618 -0.0252 -0.0548 0.1732 | 0.3860 item18 | 0.7099 -0.1273 0.3400 0.1912 | 0.3276 item19 | 0.5856 -0.1310 0.4220 0.2273 | 0.4102 item20 | 0.5331 -0.0986 0.2382 0.1632 | 0.6227 item21 | 0.7143 -0.0444 0.1319 0.0987 | 0.4606 item22 | 0.6024 -0.1166 0.2541 0.0639 | 0.5549 item23 | 0.8857 0.1157 0.0221 -0.2968 | 0.1134 item24 | 0.7246 0.0169 0.0385 -0.2133 | 0.4276 item25 | -0.0717 0.2901 0.0348 0.0744 | 0.9039 item26 | -0.0061 0.2680 0.0632 0.0585 | 0.9207 item27 | -0.1399 0.6198 0.1288 0.1428 | 0.5593 item28 | -0.0586 0.7349 0.1431 0.1606 | 0.4102 --------------------------------------------------------------------- rotate, varimax normalize blanks(.3) Factor analysis/correlation Number of obs = 1396 Method: maximum likelihood Retained factors = 4 Rotation: orthogonal varimax (Kaiser on) Number of params = 58 Schwarz's BIC = 590.691 Log likelihood = -85.34569 (Akaike's) AIC = 286.691 -------------------------------------------------------------------------- Factor | Variance Difference Proportion Cumulative -------------+------------------------------------------------------------ Factor1 | 3.25325 0.32888 0.3922 0.3922 Factor2 | 2.92437 1.70143 0.3526 0.7448 Factor3 | 1.22294 0.32944 0.1474 0.8923 Factor4 | 0.89350 . 0.1077 1.0000 -------------------------------------------------------------------------- LR test: independent vs. saturated: chi2(120) = 9652.01 Prob>chi2 = 0.0000 LR test: 4 factors vs. saturated: chi2(62) = 169.61 Prob>chi2 = 0.0000 Rotated factor loadings (pattern matrix) and unique variances --------------------------------------------------------------------- Variable | Factor1 Factor2 Factor3 Factor4 | Uniqueness -------------+----------------------------------------+-------------- item13 | 0.7843 | 0.3274 item14 | 0.7534 | 0.3588 item15 | 0.6961 0.3120 | 0.3919 item16 | 0.6114 | 0.5306 item17 | 0.6040 0.4695 | 0.3860 item18 | 0.3043 0.7487 | 0.3276 item19 | 0.7454 | 0.4102 item20 | 0.5561 | 0.6227 item21 | 0.4237 0.5581 | 0.4606 item22 | 0.5829 | 0.5549 item23 | 0.5049 0.4350 0.6651 | 0.1134 item24 | 0.4010 0.3935 0.5019 | 0.4276 item25 | 0.3044 | 0.9039 item26 | | 0.9207 item27 | 0.6562 | 0.5593 item28 | 0.7676 | 0.4102 --------------------------------------------------------------------- (blanks represent abs(loading)<.3) Factor rotation matrix -------------------------------------------------- | Factor1 Factor2 Factor3 Factor4 -------------+------------------------------------ Factor1 | 0.6778 0.5954 -0.0605 0.4270 Factor2 | 0.1006 -0.1816 0.9512 0.2282 Factor3 | -0.6581 0.7251 0.1934 0.0609 Factor4 | 0.3120 0.2945 0.2326 -0.8728 --------------------------------------------------
Almost identical results to these were obtain using SAS proc mi with proc factor and using Mplus with the missing data option.
Reference
Truxillo, C. (2005). Maximum likelihood parameter estimation with incomplete data. Proceedings of the Thirtieth Annual SAS(r) Users Group International Conference. <http://www2.sas.com/proceedings/sugi30/111-30.pdf >
Graham, J. W. (2009). Missing data analysis: Making it work in the real world. Annu. Rev. Psychol., 60, 549–576. https://www.personal.psu.edu/jxb14/M554/articles/Graham2009.pdf
Weaver, B., & Maxwell, H. (2014). Exploratory factor analysis and reliability analysis with missing data: A simple method for SPSS users. The Quantitative Methods for Psychology, 10 (2), 143-152. https://www.tqmp.org/RegularArticles/vol10-2/p143/p143.pdf