Stata does not have a command for estimating multilevel principal components analysis (PCA). This page will demonstrate one way of accomplishing this. The strategy we will take is to partition the data into between group and within group components. We will then run separate PCAs on each of these components.
Let’s begin by loading the hsbdemo dataset into Stata.
use https://stats.idre.ucla.edu/stat/data/hsbdemo, clear
Next we will place the grouping variable (cid) and our list of variable into two global macros. We will also create a sequence number within each of the groups that we will use to compute the between covariance matrix..
global id = "cid" global vlist = "read write math science socst" bysort $id: gen seq=_n
Here is how we will implement the multilevel PCA. We will create within group and between group covariance matrices. We will use the the pcamat command on each of these matrices. To create the matrices we will need to create between group variables (group means) and within group variables (raw scores – group means + grand mean).
In the following loop the egen command computes the group means which are used as the between group variables. The summarize and local commands are used to get the grand means of each of the variables. Finally, the generate computes the within group variables.
foreach x of varlist $vlist { egen be_`x' = mean(`x'), by($id) quietly summarize `x' local g`x' = r(mean) generate wi_`x' = `x' - be_`x' + `g`x'' }
Now that we have the between and within variables we are ready to create the between and within covariance matrices. Please note that in creating the between covariance matrix that we only use one observation from each group (if seq==1). We save the two covariance matrices to bcov and wcov respectively.
* between covariance matrix corr be_* if seq==1, cov (obs=20) | be_read be_write be_math be_sci~e be_socst -------------+--------------------------------------------- be_read | 83.673 be_write | 67.6696 67.0196 be_math | 74.149 62.2788 69.0276 be_science | 74.5096 65.3098 67.1724 73.8221 be_socst | 64.0656 57.3454 58.2163 60.0241 56.5561 matrix bcov = r(C) * within covariance matrix corr wi_*, cov (obs=200) | wi_read wi_write wi_math wi_sci~e wi_socst -------------+--------------------------------------------- wi_read | 24.6867 wi_write | -7.54531 24.6931 wi_math | -7.67303 -5.36381 21.2156 wi_science | -8.2731 -10.1502 -6.45554 26.122 wi_socst | 6.64865 5.84606 -1.37596 -9.00491 60.5325 matrix wcov = r(C)
Now that we have the between and within covariance matrices we can estimate the between and within principal components. The command pcamat performs principal component analysis on a correlation or covariance matrix
* between pca pcamat bcov, n(20) Principal components/correlation Number of obs = 20 Number of comp. = 5 Trace = 5 Rotation: (unrotated = principal) Rho = 1.0000 -------------------------------------------------------------------------- Component | Eigenvalue Difference Proportion Cumulative -------------+------------------------------------------------------------ Comp1 | 4.73458 4.617 0.9469 0.9469 Comp2 | .117584 .04783 0.0235 0.9704 Comp3 | .0697536 .0138508 0.0140 0.9844 Comp4 | .0559029 .0337252 0.0112 0.9956 Comp5 | .0221776 . 0.0044 1.0000 -------------------------------------------------------------------------- Principal components (eigenvectors) ------------------------------------------------------------------------------ Variable | Comp1 Comp2 Comp3 Comp4 Comp5 | Unexplained -------------+--------------------------------------------------+------------- be_read | 0.4496 -0.5099 -0.0275 0.1241 0.7223 | 0 be_write | 0.4419 0.6999 0.2843 0.4596 0.1509 | 0 be_math | 0.4501 -0.4160 -0.0472 0.4440 -0.6519 | 0 be_science | 0.4484 -0.0349 0.5642 -0.6721 -0.1668 | 0 be_socst | 0.4461 0.2754 -0.7732 -0.3529 -0.0520 | 0 ------------------------------------------------------------------------------ * within pca pcamat wcov, n(200) Principal components/correlation Number of obs = 200 Number of comp. = 5 Trace = 5 Rotation: (unrotated = principal) Rho = 1.0000 -------------------------------------------------------------------------- Component | Eigenvalue Difference Proportion Cumulative -------------+------------------------------------------------------------ Comp1 | 1.56986 .204184 0.3140 0.3140 Comp2 | 1.36568 .176014 0.2731 0.5871 Comp3 | 1.18966 .373771 0.2379 0.8250 Comp4 | .815889 .756974 0.1632 0.9882 Comp5 | .0589154 . 0.0118 1.0000 -------------------------------------------------------------------------- Principal components (eigenvectors) ------------------------------------------------------------------------------ Variable | Comp1 Comp2 Comp3 Comp4 Comp5 | Unexplained -------------+--------------------------------------------------+------------- wi_read | 0.3435 0.6999 0.2594 -0.2481 0.5131 | 0 wi_write | 0.4484 -0.4431 -0.5631 -0.1710 0.5062 | 0 wi_math | -0.1041 -0.5171 0.7097 0.0881 0.4587 | 0 wi_science | -0.6556 0.2019 -0.3324 0.3872 0.5187 | 0 wi_socst | 0.4902 0.0752 0.0395 0.8669 -0.0316 | 0 ------------------------------------------------------------------------------
The between PCA has one component with an eigenvalue greater than one while the within PCA has three eigenvalues greater than one. Just inspecting the first component, the between and within PCAs seem to be rather different. In the between PCA all of the eigenvectors are positive and nearly equal (approximately 0.45). For the within PCA, two of the eigenvectors are negative with value for science being -0.65.
Just for comparison, let’s run pca on the overall data which is just the variables in our variable list.
* overall pca pca $vlist Principal components/correlation Number of obs = 200 Number of comp. = 5 Trace = 5 Rotation: (unrotated = principal) Rho = 1.0000 -------------------------------------------------------------------------- Component | Eigenvalue Difference Proportion Cumulative -------------+------------------------------------------------------------ Comp1 | 3.38082 2.82344 0.6762 0.6762 Comp2 | .557378 .150585 0.1115 0.7876 Comp3 | .406793 .050625 0.0814 0.8690 Comp4 | .356168 .0573264 0.0712 0.9402 Comp5 | .298841 . 0.0598 1.0000 -------------------------------------------------------------------------- Principal components (eigenvectors) ------------------------------------------------------------------------------ Variable | Comp1 Comp2 Comp3 Comp4 Comp5 | Unexplained -------------+--------------------------------------------------+------------- read | 0.4664 -0.0273 -0.5313 -0.0206 -0.7064 | 0 write | 0.4484 0.2075 0.8064 0.0558 -0.3201 | 0 math | 0.4588 -0.2609 -0.0006 -0.7800 0.3361 | 0 science | 0.4356 -0.6109 -0.0070 0.5895 0.2992 | 0 socst | 0.4257 0.7176 -0.2596 0.2013 0.4427 | 0 ------------------------------------------------------------------------------
In this example the overall PCA is fairly similar to the between group PCA.