Standard methods of performing factor analysis ( i.e., those based on a matrix of
Pearson’s correlations) assume that the variables are continuous and follow a multivariate normal
distribution. If the model includes variables that are dichotomous or ordinal a factor analysis can be performed using a polychoric
correlation matrix. In Stata we can generate a matrix of polychoric
correlations using the user-written command **polychoric**. You can find and
install the **polychoric** command by typing **search polychoric** in the Stata
command window and following the directions the screen. For more information on
locating and installing user-written commands see our FAQ:
How do I use search to search for programs and
additional help?. Note that variables used with **polychoric** may be
binary (0/1), ordinal, or continuous, but cannot be nominal (unordered
categories). Also note that the correlations in the matrix produced by the
**polychoric** command are not all polychoric correlations. When both
variables have 10 or fewer observed values, a polychoric correlation is
calculated, when only one of the variables takes on 10 or fewer values ( i.e., one
variable is continuous and the other categorical) a polyserial correlation is
calculated, and if both variables take on more than 10 values a Pearson’s
correlation is calculated. Once we have a polychoric correlation matrix, we can use
the **factormat** command to perform an exploratory factor analysis using the
matrix as input, rather than raw variables.

The dataset for this example includes data on 1428 college students
and their instructors. The example analysis includes dichotomous variables, including
faculty sex (**facsex**) and faculty nationality (US citizen or foreign citizen,
**facnat**); ordered categorical variables, including faculty rank (**facrank**), student
rank (**studrank**) and grade (A, B, C, etc., **grade**);
and the continuous variables faculty salary (**salary**), years teaching at
the University of Texas (**yrsut**), and number of students in the class (**nstud**) in this analysis. These
variables were selected to represent a range of types of variables ( i.e., dichotomous, ordered categorical, and continuous),
and do not necessarily form substantively meaningful factors.

Below we open the dataset and generate the polychoric correlation matrix for the
eight variables in our analysis. You may notice that the **polychoric** command
runs somewhat more slowly than Stata’s **correlate** and **pwcorr**
commands, this is normal.

use https://stats.idre.ucla.edu/stat/stata/output/m255, clearpolychoric facsex facnat facrank studrank grade salary yrsut nstudPolychoric correlation matrix facsex facnat facrank studrank grade salary facsex 1 facnat -.08153951 1 facrank -.33496545 -.54985327 1 studrank .14701719 -.04503906 -.0006882 1 grade -.05250522 -.07768724 .03336171 .21606134 1 salary -.24422069 -.31687704 .75225252 .04830565 -.0073763 1 yrsut -.09789967 -.49838303 .68902129 .00459421 .01994406 .53046614 nstud -.46151997 .2795961 -.17809723 -.33304524 -.13713578 -.08439606 yrsut nstud yrsut 1 nstud -.31031949 1

The **polychoric** command does not does not display the number of cases (with listwise deletion)
used to generate the matrix,
but it does store the n in **r(sum_w)** so we can use the **display** command to view it. Then we use the
**matrix** command to store the polychoric correlation matrix (saved in **r(R)**
by the polychoric command) as **r**, so that we can use it with the
**factormat** command. The **factormat** command is followed by the name
of the matrix we wish to use for the analysis ( i.e., **r**). The **n(**…**)**
"option" gives the sample size, and is required. We have used the **factors(**…**)**
option to indicate that we wish to retain three factors.

display r(sum_w)1338global N = r(sum_w)matrix r = r(R) factormat r, n($N) factors(3)(obs=1338) Factor analysis/correlation Number of obs = 1338 Method: principal factors Retained factors = 3 Rotation: (unrotated) Number of params = 21 -------------------------------------------------------------------------- Factor | Eigenvalue Difference Proportion Cumulative -------------+------------------------------------------------------------ Factor1 | 2.42705 1.26666 0.6995 0.6995 Factor2 | 1.16039 0.84359 0.3344 1.0340 Factor3 | 0.31680 0.18808 0.0913 1.1253 Factor4 | 0.12871 0.16060 0.0371 1.1624 Factor5 | -0.03189 0.08326 -0.0092 1.1532 Factor6 | -0.11515 0.05212 -0.0332 1.1200 Factor7 | -0.16727 0.08181 -0.0482 1.0718 Factor8 | -0.24908 . -0.0718 1.0000 -------------------------------------------------------------------------- LR test: independent vs. saturated: chi2(28) = 3824.64 Prob>chi2 = 0.0000 Factor loadings (pattern matrix) and unique variances ----------------------------------------------------------- Variable | Factor1 Factor2 Factor3 | Uniqueness -------------+------------------------------+-------------- facsex | -0.1902 -0.6651 -0.2171 | 0.4744 facnat | -0.5913 0.2174 0.1465 | 0.5816 facrank | 0.9183 0.1642 0.0173 | 0.1295 studrank | 0.0645 -0.3558 0.3430 | 0.7516 grade | 0.0636 -0.1380 0.3316 | 0.8670 salary | 0.7365 0.1822 0.0751 | 0.4187 yrsut | 0.7520 -0.0762 -0.1107 | 0.4165 nstud | -0.2861 0.6777 -0.0493 | 0.4565 -----------------------------------------------------------

The above factor analysis output can be interpreted in a manner similar to a standard factor analysis model, including the use of rotation methods to increase interpretability.

## See also

Stata Annotated Output: Factor Analysis