Standard methods of performing factor analysis ( i.e., those based on a matrix of Pearson’s correlations) assume that the variables are continuous and follow a multivariate normal distribution. If the model includes variables that are dichotomous or ordinal a factor analysis can be performed using a polychoric correlation matrix. In Stata we can generate a matrix of polychoric correlations using the user-written command polychoric. You can find and install the polychoric command by typing search polychoric in the Stata command window and following the directions the screen. For more information on locating and installing user-written commands see our FAQ: How do I use search to search for programs and additional help?. Note that variables used with polychoric may be binary (0/1), ordinal, or continuous, but cannot be nominal (unordered categories). Also note that the correlations in the matrix produced by the polychoric command are not all polychoric correlations. When both variables have 10 or fewer observed values, a polychoric correlation is calculated, when only one of the variables takes on 10 or fewer values ( i.e., one variable is continuous and the other categorical) a polyserial correlation is calculated, and if both variables take on more than 10 values a Pearson’s correlation is calculated. Once we have a polychoric correlation matrix, we can use the factormat command to perform an exploratory factor analysis using the matrix as input, rather than raw variables.
The dataset for this example includes data on 1428 college students and their instructors. The example analysis includes dichotomous variables, including faculty sex (facsex) and faculty nationality (US citizen or foreign citizen, facnat); ordered categorical variables, including faculty rank (facrank), student rank (studrank) and grade (A, B, C, etc., grade); and the continuous variables faculty salary (salary), years teaching at the University of Texas (yrsut), and number of students in the class (nstud) in this analysis. These variables were selected to represent a range of types of variables ( i.e., dichotomous, ordered categorical, and continuous), and do not necessarily form substantively meaningful factors.
Below we open the dataset and generate the polychoric correlation matrix for the eight variables in our analysis. You may notice that the polychoric command runs somewhat more slowly than Stata’s correlate and pwcorr commands, this is normal.
use https://stats.idre.ucla.edu/stat/stata/output/m255, clear polychoric facsex facnat facrank studrank grade salary yrsut nstud Polychoric correlation matrix facsex facnat facrank studrank grade salary facsex 1 facnat -.08153951 1 facrank -.33496545 -.54985327 1 studrank .14701719 -.04503906 -.0006882 1 grade -.05250522 -.07768724 .03336171 .21606134 1 salary -.24422069 -.31687704 .75225252 .04830565 -.0073763 1 yrsut -.09789967 -.49838303 .68902129 .00459421 .01994406 .53046614 nstud -.46151997 .2795961 -.17809723 -.33304524 -.13713578 -.08439606 yrsut nstud yrsut 1 nstud -.31031949 1
The polychoric command does not does not display the number of cases (with listwise deletion) used to generate the matrix, but it does store the n in r(sum_w) so we can use the display command to view it. Then we use the matrix command to store the polychoric correlation matrix (saved in r(R) by the polychoric command) as r, so that we can use it with the factormat command. The factormat command is followed by the name of the matrix we wish to use for the analysis ( i.e., r). The n(…) "option" gives the sample size, and is required. We have used the factors(…) option to indicate that we wish to retain three factors.
display r(sum_w) 1338 global N = r(sum_w) matrix r = r(R) factormat r, n($N) factors(3) (obs=1338) Factor analysis/correlation Number of obs = 1338 Method: principal factors Retained factors = 3 Rotation: (unrotated) Number of params = 21 -------------------------------------------------------------------------- Factor | Eigenvalue Difference Proportion Cumulative -------------+------------------------------------------------------------ Factor1 | 2.42705 1.26666 0.6995 0.6995 Factor2 | 1.16039 0.84359 0.3344 1.0340 Factor3 | 0.31680 0.18808 0.0913 1.1253 Factor4 | 0.12871 0.16060 0.0371 1.1624 Factor5 | -0.03189 0.08326 -0.0092 1.1532 Factor6 | -0.11515 0.05212 -0.0332 1.1200 Factor7 | -0.16727 0.08181 -0.0482 1.0718 Factor8 | -0.24908 . -0.0718 1.0000 -------------------------------------------------------------------------- LR test: independent vs. saturated: chi2(28) = 3824.64 Prob>chi2 = 0.0000 Factor loadings (pattern matrix) and unique variances ----------------------------------------------------------- Variable | Factor1 Factor2 Factor3 | Uniqueness -------------+------------------------------+-------------- facsex | -0.1902 -0.6651 -0.2171 | 0.4744 facnat | -0.5913 0.2174 0.1465 | 0.5816 facrank | 0.9183 0.1642 0.0173 | 0.1295 studrank | 0.0645 -0.3558 0.3430 | 0.7516 grade | 0.0636 -0.1380 0.3316 | 0.8670 salary | 0.7365 0.1822 0.0751 | 0.4187 yrsut | 0.7520 -0.0762 -0.1107 | 0.4165 nstud | -0.2861 0.6777 -0.0493 | 0.4565 -----------------------------------------------------------
The above factor analysis output can be interpreted in a manner similar to a standard factor analysis model, including the use of rotation methods to increase interpretability.