Stata does not have a command for estimating multilevel principal components analysis (PCA). This page will demonstrate one way of accomplishing this. The strategy we will take is to partition the data into between group and within group components. We will then run separate PCAs on each of these components.
Let’s begin by loading the hsbdemo dataset into Stata.
use https://stats.idre.ucla.edu/stat/data/hsbdemo, clear
Next we will place the grouping variable (cid) and our list of variable into two global macros. We will also create a sequence number within each of the groups that we will use to compute the between covariance matrix..
global id = "cid" global vlist = "read write math science socst" bysort $id: gen seq=_n
Here is how we will implement the multilevel PCA. We will create within group and between group covariance matrices. We will use the the pcamat command on each of these matrices. To create the matrices we will need to create between group variables (group means) and within group variables (raw scores – group means + grand mean).
In the following loop the egen command computes the group means which are used as the between group variables. The summarize and local commands are used to get the grand means of each of the variables. Finally, the generate computes the within group variables.
foreach x of varlist $vlist {
egen be_`x' = mean(`x'), by($id)
quietly summarize `x'
local g`x' = r(mean)
generate wi_`x' = `x' - be_`x' + `g`x''
}
Now that we have the between and within variables we are ready to create the between and within covariance matrices. Please note that in creating the between covariance matrix that we only use one observation from each group (if seq==1). We save the two covariance matrices to bcov and wcov respectively.
* between covariance matrix
corr be_* if seq==1, cov
(obs=20)
| be_read be_write be_math be_sci~e be_socst
-------------+---------------------------------------------
be_read | 83.673
be_write | 67.6696 67.0196
be_math | 74.149 62.2788 69.0276
be_science | 74.5096 65.3098 67.1724 73.8221
be_socst | 64.0656 57.3454 58.2163 60.0241 56.5561
matrix bcov = r(C)
* within covariance matrix
corr wi_*, cov
(obs=200)
| wi_read wi_write wi_math wi_sci~e wi_socst
-------------+---------------------------------------------
wi_read | 24.6867
wi_write | -7.54531 24.6931
wi_math | -7.67303 -5.36381 21.2156
wi_science | -8.2731 -10.1502 -6.45554 26.122
wi_socst | 6.64865 5.84606 -1.37596 -9.00491 60.5325
matrix wcov = r(C)
Now that we have the between and within covariance matrices we can estimate the between and within principal components. The command pcamat performs principal component analysis on a correlation or covariance matrix
* between pca
pcamat bcov, n(20)
Principal components/correlation Number of obs = 20
Number of comp. = 5
Trace = 5
Rotation: (unrotated = principal) Rho = 1.0000
--------------------------------------------------------------------------
Component | Eigenvalue Difference Proportion Cumulative
-------------+------------------------------------------------------------
Comp1 | 4.73458 4.617 0.9469 0.9469
Comp2 | .117584 .04783 0.0235 0.9704
Comp3 | .0697536 .0138508 0.0140 0.9844
Comp4 | .0559029 .0337252 0.0112 0.9956
Comp5 | .0221776 . 0.0044 1.0000
--------------------------------------------------------------------------
Principal components (eigenvectors)
------------------------------------------------------------------------------
Variable | Comp1 Comp2 Comp3 Comp4 Comp5 | Unexplained
-------------+--------------------------------------------------+-------------
be_read | 0.4496 -0.5099 -0.0275 0.1241 0.7223 | 0
be_write | 0.4419 0.6999 0.2843 0.4596 0.1509 | 0
be_math | 0.4501 -0.4160 -0.0472 0.4440 -0.6519 | 0
be_science | 0.4484 -0.0349 0.5642 -0.6721 -0.1668 | 0
be_socst | 0.4461 0.2754 -0.7732 -0.3529 -0.0520 | 0
------------------------------------------------------------------------------
* within pca
pcamat wcov, n(200)
Principal components/correlation Number of obs = 200
Number of comp. = 5
Trace = 5
Rotation: (unrotated = principal) Rho = 1.0000
--------------------------------------------------------------------------
Component | Eigenvalue Difference Proportion Cumulative
-------------+------------------------------------------------------------
Comp1 | 1.56986 .204184 0.3140 0.3140
Comp2 | 1.36568 .176014 0.2731 0.5871
Comp3 | 1.18966 .373771 0.2379 0.8250
Comp4 | .815889 .756974 0.1632 0.9882
Comp5 | .0589154 . 0.0118 1.0000
--------------------------------------------------------------------------
Principal components (eigenvectors)
------------------------------------------------------------------------------
Variable | Comp1 Comp2 Comp3 Comp4 Comp5 | Unexplained
-------------+--------------------------------------------------+-------------
wi_read | 0.3435 0.6999 0.2594 -0.2481 0.5131 | 0
wi_write | 0.4484 -0.4431 -0.5631 -0.1710 0.5062 | 0
wi_math | -0.1041 -0.5171 0.7097 0.0881 0.4587 | 0
wi_science | -0.6556 0.2019 -0.3324 0.3872 0.5187 | 0
wi_socst | 0.4902 0.0752 0.0395 0.8669 -0.0316 | 0
------------------------------------------------------------------------------
The between PCA has one component with an eigenvalue greater than one while the within PCA has three eigenvalues greater than one. Just inspecting the first component, the between and within PCAs seem to be rather different. In the between PCA all of the eigenvectors are positive and nearly equal (approximately 0.45). For the within PCA, two of the eigenvectors are negative with value for science being -0.65.
Just for comparison, let’s run pca on the overall data which is just the variables in our variable list.
* overall pca
pca $vlist
Principal components/correlation Number of obs = 200
Number of comp. = 5
Trace = 5
Rotation: (unrotated = principal) Rho = 1.0000
--------------------------------------------------------------------------
Component | Eigenvalue Difference Proportion Cumulative
-------------+------------------------------------------------------------
Comp1 | 3.38082 2.82344 0.6762 0.6762
Comp2 | .557378 .150585 0.1115 0.7876
Comp3 | .406793 .050625 0.0814 0.8690
Comp4 | .356168 .0573264 0.0712 0.9402
Comp5 | .298841 . 0.0598 1.0000
--------------------------------------------------------------------------
Principal components (eigenvectors)
------------------------------------------------------------------------------
Variable | Comp1 Comp2 Comp3 Comp4 Comp5 | Unexplained
-------------+--------------------------------------------------+-------------
read | 0.4664 -0.0273 -0.5313 -0.0206 -0.7064 | 0
write | 0.4484 0.2075 0.8064 0.0558 -0.3201 | 0
math | 0.4588 -0.2609 -0.0006 -0.7800 0.3361 | 0
science | 0.4356 -0.6109 -0.0070 0.5895 0.2992 | 0
socst | 0.4257 0.7176 -0.2596 0.2013 0.4427 | 0
------------------------------------------------------------------------------
In this example the overall PCA is fairly similar to the between group PCA.
