When two random variables under consideration are dichotomous variables or ordinal categorical variables, we might need to compute the tetrachoric/polychoric correlations. The calculation of tetrachoric/polychoric correlation is under the assumption that the two dichotomous variables represent underlying normal distributions. When both variables are binary, the correlation is called tetrachoric correlation and in a more general case it is called polychoric correlation. In SAS, proc freq is used to obtain tetrachoric/polychoric correlation.
In the following examples, we will use data set hsb2.sas7bdat.
Example 1: Computing tetrachoric correlation between two dichotomous variables
We specify the plcorr option in the tables statement to request for polychoric correlation. The two variables of interest are female and honors (= write>=60) which is created in the data step below.
data hsb2; set ats.hsb2; honors = (write>=60); run; proc freq data = hsb2; tables honors*female /plcorr; run;Table of honors by female honors female Frequency| Percent | Row Pct | Col Pct | 0| 1| Total ---------+--------+--------+ 0 | 73 | 74 | 147 | 36.50 | 37.00 | 73.50 | 49.66 | 50.34 | | 80.22 | 67.89 | ---------+--------+--------+ 1 | 18 | 35 | 53 | 9.00 | 17.50 | 26.50 | 33.96 | 66.04 | | 19.78 | 32.11 | ---------+--------+--------+ Total 91 109 200 45.50 54.50 100.00 Statistic Value ASE ------------------------------------------------------ Gamma 0.3146 0.1503 Kendall's Tau-b 0.1391 0.0684 Stuart's Tau-c 0.1223 0.0607 Somers' D C|R 0.1570 0.0770 Somers' D R|C 0.1233 0.0612 Pearson Correlation 0.1391 0.0684 Spearman Correlation 0.1391 0.0684 Tetrachoric Correlation 0.2362 0.1156 Lambda Asymmetric C|R 0.0000 0.0000 Lambda Asymmetric R|C 0.0000 0.0000 Lambda Symmetric 0.0000 0.0000 Uncertainty Coefficient C|R 0.0143 0.0142 Uncertainty Coefficient R|C 0.0170 0.0169 Uncertainty Coefficient Symmetric 0.0155 0.0154 Estimates of the Relative Risk (Row1/Row2) Type of Study Value 95% Confidence Limits ----------------------------------------------------------------- Case-Control (Odds Ratio) 1.9182 0.9974 3.6890 Cohort (Col1 Risk) 1.4622 0.9712 2.2015 Cohort (Col2 Risk) 0.7623 0.5930 0.9799 Sample Size = 200
Example 2: Computing polychoric correlation among two or more ordinal categorical variables
We will use SAS ODS to output the polychoric correlation to a data set. SAS can produce a number of output data sets based on the output from a procedure using ODS (Output Delivery System). Tetrachoric and polychoric correlations are in the data set called measures since SAS put it with all other measures of associations together. We can subset it to only contain tetrachoric and polychoric correlations using the where statement in the process of creating this data set.
proc freq data = hsb2; tables (female ses honors)*(female ses honors) /plcorr; ods output measures=mycorr (where=(statistic="Tetrachoric Correlation" or statistic="Polychoric Correlation") keep = statistic table value); run; proc print data = mycorr; run;Obs Table Statistic Value 1 Table female * female Tetrachoric Correlation 1.0000 2 Table ses * female Polychoric Correlation -0.1741 3 Table honors * female Tetrachoric Correlation 0.2362 4 Table female * ses Polychoric Correlation -0.1741 5 Table ses * ses Polychoric Correlation 1.0000 6 Table honors * ses Polychoric Correlation 0.2769 7 Table female * honors Tetrachoric Correlation 0.2362 8 Table ses * honors Polychoric Correlation 0.2769 9 Table honors * honors Tetrachoric Correlation 1.0000
Example 3: Obtaining a polychoric correlation matrix for a group of variables
The example above shows how to obtain polychoric correlations for multiple variables. But the output is not in matrix format and this can be a problem if further analysis is to be performed using the correlation matrix. In this example, we show some data steps to convert the output into a data set of correlation matrix type. In the data step below, we created three variables, group, x and y. Since there are three variables, the correlation matrix will have three rows and three columns. This is what the group variable is going to be used for. Each correlation involves two variables, the name of the first variable is stored in variable x and the second one in y.
proc freq data = hsb2; tables (female ses honors)*(female ses honors) /plcorr; ods output measures=mycorr (where=(statistic="Tetrachoric Correlation" or statistic="Polychoric Correlation") keep = statistic table value); run; data mycorrt; set mycorr ; group = floor((_n_ - 1)/3); x = scan(table, 2, " *"); y = scan(table, 3, " *"); keep group value table x y; run; proc print data = mycorrt; run;Obs Table Value group x y 1 Table female * female 1.0000 0 female female 2 Table ses * female -0.1741 0 ses female 3 Table honors * female 0.2362 0 honors female 4 Table female * ses -0.1741 1 female ses 5 Table ses * ses 1.0000 1 ses ses 6 Table honors * ses 0.2769 1 honors ses 7 Table female * honors 0.2362 2 female honors 8 Table ses * honors 0.2769 2 ses honors 9 Table honors * honors 1.0000 2 honors honors
Now we are ready to transpose the data set up to a matrix format.
proc transpose data = mycorrt out=mymatrix (drop = _name_ group) ; id x; by group; var value ; run; proc print data = mymatrix; run;Obs female ses honors 1 1.0000 -0.1741 0.2362 2 -0.1741 1.0000 0.2769 3 0.2362 0.2769 1.0000