When two random variables under consideration are dichotomous variables
or ordinal categorical variables,
we might need to compute the tetrachoric/polychoric correlations. The
calculation of tetrachoric/polychoric correlation is under the assumption that the two
dichotomous variables represent underlying normal distributions. When both
variables are binary, the correlation is called tetrachoric correlation and in a
more general case it is called polychoric correlation. In SAS, **proc
freq** is used to obtain tetrachoric/polychoric correlation.

In the following examples, we will use data set hsb2.sas7bdat.

**Example 1**: Computing tetrachoric correlation between two dichotomous
variables

We specify the **plcorr** option in the **tables** statement to request
for polychoric correlation. The two variables of interest are **female** and
**honors** (= write>=60) which is created in the data step below.

data hsb2; set ats.hsb2; honors = (write>=60); run; proc freq data = hsb2; tables honors*female /plcorr; run;Table of honors by female honors female Frequency| Percent | Row Pct | Col Pct | 0| 1| Total ---------+--------+--------+ 0 | 73 | 74 | 147 | 36.50 | 37.00 | 73.50 | 49.66 | 50.34 | | 80.22 | 67.89 | ---------+--------+--------+ 1 | 18 | 35 | 53 | 9.00 | 17.50 | 26.50 | 33.96 | 66.04 | | 19.78 | 32.11 | ---------+--------+--------+ Total 91 109 200 45.50 54.50 100.00 Statistic Value ASE ------------------------------------------------------ Gamma 0.3146 0.1503 Kendall's Tau-b 0.1391 0.0684 Stuart's Tau-c 0.1223 0.0607 Somers' D C|R 0.1570 0.0770 Somers' D R|C 0.1233 0.0612 Pearson Correlation 0.1391 0.0684 Spearman Correlation 0.1391 0.0684Tetrachoric Correlation 0.2362 0.1156Lambda Asymmetric C|R 0.0000 0.0000 Lambda Asymmetric R|C 0.0000 0.0000 Lambda Symmetric 0.0000 0.0000 Uncertainty Coefficient C|R 0.0143 0.0142 Uncertainty Coefficient R|C 0.0170 0.0169 Uncertainty Coefficient Symmetric 0.0155 0.0154 Estimates of the Relative Risk (Row1/Row2) Type of Study Value 95% Confidence Limits ----------------------------------------------------------------- Case-Control (Odds Ratio) 1.9182 0.9974 3.6890 Cohort (Col1 Risk) 1.4622 0.9712 2.2015 Cohort (Col2 Risk) 0.7623 0.5930 0.9799 Sample Size = 200

**Example 2**: Computing polychoric correlation among two or more
ordinal categorical variables

We will use SAS ODS to output the polychoric correlation to a data set. SAS
can produce a number of output data sets based on the output from a procedure
using ODS (Output Delivery System). Tetrachoric and polychoric correlations are in the data set
called **measures** since SAS put it with all other measures of associations
together. We can subset it to only contain tetrachoric and polychoric correlations using
the **where** statement in the process of creating this data set.

proc freq data = hsb2; tables (female ses honors)*(female ses honors) /plcorr; ods output measures=mycorr (where=(statistic="Tetrachoric Correlation" or statistic="Polychoric Correlation") keep = statistic table value); run; proc print data = mycorr; run;Obs Table Statistic Value 1 Table female * female Tetrachoric Correlation 1.0000 2 Table ses * female Polychoric Correlation -0.1741 3 Table honors * female Tetrachoric Correlation 0.2362 4 Table female * ses Polychoric Correlation -0.1741 5 Table ses * ses Polychoric Correlation 1.0000 6 Table honors * ses Polychoric Correlation 0.2769 7 Table female * honors Tetrachoric Correlation 0.2362 8 Table ses * honors Polychoric Correlation 0.2769 9 Table honors * honors Tetrachoric Correlation 1.0000

**Example 3**: Obtaining a polychoric correlation matrix for a group of
variables

The example above shows how to obtain polychoric correlations for multiple
variables. But the output is not in matrix format and this can be a problem if
further analysis is to be performed using the correlation matrix. In this
example, we show some data steps to convert the output into a data set of
correlation matrix type. In the data step below, we created three variables, **
group**, **x** and **y**. Since there are three variables, the
correlation matrix will have three rows and three columns. This is what the **
group** variable is going to be used for. Each correlation involves two
variables, the name of the first variable is stored in variable **x** and the
second one in **y**.

proc freq data = hsb2; tables (female ses honors)*(female ses honors) /plcorr; ods output measures=mycorr (where=(statistic="Tetrachoric Correlation" or statistic="Polychoric Correlation") keep = statistic table value); run; data mycorrt; set mycorr ; group = floor((_n_ - 1)/3); x = scan(table, 2, " *"); y = scan(table, 3, " *"); keep group value table x y; run; proc print data = mycorrt; run;Obs Table Value group x y 1 Table female * female 1.0000 0 female female 2 Table ses * female -0.1741 0 ses female 3 Table honors * female 0.2362 0 honors female 4 Table female * ses -0.1741 1 female ses 5 Table ses * ses 1.0000 1 ses ses 6 Table honors * ses 0.2769 1 honors ses 7 Table female * honors 0.2362 2 female honors 8 Table ses * honors 0.2769 2 ses honors 9 Table honors * honors 1.0000 2 honors honors

Now we are ready to transpose the data set up to a matrix format.

proc transpose data = mycorrt out=mymatrix (drop = _name_ group) ; id x; by group; var value ; run; proc print data = mymatrix; run;Obs female ses honors 1 1.0000 -0.1741 0.2362 2 -0.1741 1.0000 0.2769 3 0.2362 0.2769 1.0000