There are times when you want to do correspondence anlysis and the data have been collapsed into a summary with counts for each of the categories. For example, here is a dataset with the number of degrees given in 12 disciplines over eight different years.
discipline 1960 1965 1970 1971 1972 1973 1974 1975 Agri 414 576 803 900 855 853 830 904 Anth 69 82 217 240 260 324 381 385 Bio 1245 1963 3360 3633 3580 3636 3473 3498 Chem 1078 1444 2234 2204 2011 1849 1792 1762 Earth 253 375 511 550 580 577 570 556 Econ 341 538 826 791 863 907 833 867 Eng 794 2073 3432 3495 3475 3338 3144 2959 Math 291 685 1222 1236 1281 1222 1196 1149 Oth 314 502 1079 1392 1500 1609 1531 1550 Phy 530 1046 1655 1740 1635 1590 134 1293 Psych 772 954 1888 2116 2262 2444 2587 2749 Soc 162 239 504 583 638 599 645 680
We will begin by reading in the data.
data ca_summary; input disc $ v60 v65 v70 v71 v72 v73 v74 v75; datalines; eng 794 2073 3432 3495 3475 3338 3144 2959 math 291 685 1222 1236 1281 1222 1196 1149 phy 530 1046 1655 1740 1635 1590 134 1293 chem 1078 1444 2234 2204 2011 1849 1792 1762 earth 253 375 511 550 580 577 570 556 bio 1245 1963 3360 3633 3580 3636 3473 3498 agri 414 576 803 900 855 853 830 904 psych 772 954 1888 2116 2262 2444 2587 2749 socio 162 239 504 583 638 599 645 680 econ 341 538 826 791 863 907 833 867 anthro 69 82 217 240 260 324 381 385 others 314 502 1079 1392 1500 1609 1531 1550 ; run;
Now we are ready to run the correspondence analysis and plot the results.
proc corresp data=ca_summary out=coord short; var v60 v65 v70 v71 v72 v73 v74 v75; id disc; run; The CORRESP Procedure Inertia and Chi-Square Decomposition Singular Principal Chi- Cumulative Value Inertia Square Percent Percent 14 28 42 56 70 ----+----+----+----+----+--- 0.12662 0.01603 2031.34 68.55 68.55 ************************ 0.06636 0.00440 557.91 18.83 87.38 ******* 0.04960 0.00246 311.75 10.52 97.90 **** 0.01496 0.00022 28.36 0.96 98.86 0.01282 0.00016 20.81 0.70 99.56 0.00796 0.00006 8.04 0.27 99.83 0.00629 0.00004 5.01 0.17 100.00 Total 0.02339 2963.21 100.00 Degrees of Freedom = 77 Row Coordinates Dim1 Dim2 eng 0.0151 -0.0248 math -0.0203 -0.0322 phy 0.3461 -0.1147 chem 0.1003 0.1269 earth 0.0002 0.0777 bio -0.0182 0.0135 agri 0.0204 0.0835 psych -0.1386 -0.0091 socio -0.1218 -0.0459 econ -0.0034 0.0432 anthro -0.2726 -0.0515 others -0.1475 -0.0918 Column Coordinates Dim1 Dim2 v60 0.1142 0.2069 v65 0.1816 0.0676 v70 0.1048 0.0057 v71 0.0694 -0.0248 v72 0.0252 -0.0464 v73 -0.0114 -0.0631 v74 -0.2613 0.0695 v75 -0.0859 -0.0409 proc sgplot data = coord noautolegend; xaxis min = -.4 max = .4 values=(-.3 to .3 by .1) valueshint; yaxis min = -.3 max = .3; scatter x = dim1 y = dim2 /group = _type_ MARKERCHAR = disc markercharattrs=(size=10 weight=bold); run;