There are times when you want to do correspondence anlysis and the data have been collapsed into a summary with counts for each of the categories. For example, here is a table with the number of degrees given in 12 disciplines over eight different years.
discipline 1960 1965 1970 1971 1972 1973 1974 1975
Agri 414 576 803 900 855 853 830 904
Anth 69 82 217 240 260 324 381 385
Bio 1245 1963 3360 3633 3580 3636 3473 3498
Chem 1078 1444 2234 2204 2011 1849 1792 1762
Earth 253 375 511 550 580 577 570 556
Econ 341 538 826 791 863 907 833 867
Eng 794 2073 3432 3495 3475 3338 3144 2959
Math 291 685 1222 1236 1281 1222 1196 1149
Oth 314 502 1079 1392 1500 1609 1531 1550
Phy 530 1046 1655 1740 1635 1590 134 1293
Psych 772 954 1888 2116 2262 2444 2587 2749
Soc 162 239 504 583 638 599 645 680
We will begin by reading and describing the data.
use https://stats.idre.ucla.edu/stat/data/casummary, clear
describe
Contains data
obs: 12
vars: 9
size: 480 (99.9% of memory free)
-------------------------------------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------------------------------------
ndisc long %8.0g ndisc
v60 float %9.0g 60 v
v65 float %9.0g 65 v
v70 float %9.0g 70 v
v71 float %9.0g 71 v
v72 float %9.0g 72 v
v73 float %9.0g 73 v
v74 float %9.0g 74 v
v75 float %9.0g 75 v
-------------------------------------------------------------------------------------------------------------
The problem is that we can’t run the ca when the data are in a wide format of summary data. The solution is to reshape the data into long form before running ca with frequency weights. Below you will see the reshape command and a partial listing of the reshaped data.
reshape long v, i(ndisc) j(y)
(note: j = 60 65 70 71 72 73 74 75)
Data wide -> long
-----------------------------------------------------------------------------
Number of obs. 12 -> 96
Number of variables 9 -> 3
j variable (8 values) -> y
xij variables:
v60 v65 ... v75 -> v
-----------------------------------------------------------------------------
clist in 1/15
ndisc y v
1. Agri 60 414
2. Agri 65 576
3. Agri 70 803
4. Agri 71 900
5. Agri 72 855
6. Agri 73 853
7. Agri 74 830
8. Agri 75 904
9. Anth 60 69
10. Anth 65 82
11. Anth 70 217
12. Anth 71 240
13. Anth 72 260
14. Anth 73 324
15. Anth 74 381
Now we are ready to run the correspondence analysis and plot the results.
ca ndisc y [fw=v], norm(principal)
Correspondence analysis Number of obs = 126707
Pearson chi2(77) = 2963.21
Prob > chi2 = 0.0000
Total inertia = 0.0234
12 active rows Number of dim. = 2
8 active columns Expl. inertia (%) = 87.38
| singular principal cumul
Dimension | value inertia chi2 percent percent
------------+------------------------------------------------------------
dim 1 | .1266166 .0160318 2031.34 68.55 68.55
dim 2 | .0663563 .0044032 557.91 18.83 87.38
dim 3 | .0496024 .0024604 311.75 10.52 97.90
dim 4 | .0149596 .0002238 28.36 0.96 98.86
dim 5 | .0128167 .0001643 20.81 0.70 99.56
dim 6 | .0079637 .0000634 8.04 0.27 99.83
dim 7 | .0062852 .0000395 5.01 0.17 100.00
------------+------------------------------------------------------------
total | .0233863 2963.21 100
Statistics for row and column categories in principal normalization
| overall | dimension_1 | dimension_2
Categories | mass quality %inert | coord sqcorr contrib | coord sqcorr contrib
-------------+---------------------------+---------------------------+---------------------------
ndisc | | |
Agri | 0.048 0.725 0.021 | 0.020 0.041 0.001 | 0.084 0.684 0.077
Anth | 0.015 0.925 0.055 | -0.273 0.893 0.072 | -0.052 0.032 0.009
Bio | 0.192 0.845 0.005 | -0.018 0.544 0.004 | 0.013 0.301 0.008
Chem | 0.113 0.983 0.129 | 0.100 0.378 0.071 | 0.127 0.605 0.415
Earth | 0.031 0.725 0.011 | 0.000 0.000 0.000 | 0.078 0.725 0.043
Econ | 0.047 0.462 0.008 | -0.003 0.003 0.000 | 0.043 0.460 0.020
Eng | 0.179 0.107 0.060 | 0.015 0.029 0.003 | -0.025 0.078 0.025
Math | 0.065 0.256 0.016 | -0.020 0.073 0.002 | -0.032 0.183 0.015
Oth | 0.075 0.949 0.102 | -0.147 0.684 0.101 | -0.092 0.265 0.143
Phy | 0.076 0.972 0.444 | 0.346 0.876 0.568 | -0.115 0.096 0.227
Psych | 0.124 0.838 0.122 | -0.139 0.835 0.149 | -0.009 0.004 0.002
Soc | 0.032 0.896 0.026 | -0.122 0.785 0.030 | -0.046 0.112 0.015
-------------+---------------------------+---------------------------+---------------------------
y | | |
60 | 0.049 0.763 0.155 | 0.114 0.178 0.040 | 0.207 0.585 0.480
65 | 0.083 0.900 0.148 | 0.182 0.790 0.170 | 0.068 0.110 0.086
70 | 0.140 0.821 0.080 | 0.105 0.818 0.096 | 0.006 0.002 0.001
71 | 0.149 0.867 0.040 | 0.069 0.769 0.045 | -0.025 0.098 0.021
72 | 0.149 0.883 0.020 | 0.025 0.201 0.006 | -0.046 0.682 0.073
73 | 0.150 0.806 0.033 | -0.011 0.026 0.001 | -0.063 0.780 0.135
74 | 0.135 0.968 0.436 | -0.261 0.904 0.575 | 0.069 0.064 0.148
75 | 0.145 0.635 0.088 | -0.086 0.518 0.067 | -0.041 0.117 0.055
-------------------------------------------------------------------------------------------------
cabiplot, origin
cabiplot, origin nocol
cabiplot, origin norow

