The SAS macro **corr2data** can be used to generate a dataset of a given
size with a given correlation/covariance structure. This can be a very useful
step in a simulation process. The macro
program can be found here.

If you have already downloaded the macro, you can paste the code into the
program editor or, alternatively, use **%include**.

## Example 1: Using a correlation matrix from an existing dataset.

First, We have a dataset, https://stats.idre.ucla.edu/wp-content/uploads/2016/02/auto.sas7bdat, from which we will
calculate a correlation matrix and then, using the **
corr2data** macro, generate a new dataset with the same correlation structure.
To use the macro, we need to generate and save the correlation matrix.
Let’s look at the correlations between the variables **price**, **mpg**,
and **weight**.

proc corr data = auto outp=p nosimple noprob; var price mpg weight; run; data corr; set p; if _type_="CORR"; run; proc print data = corr; run;Obs _TYPE_ _NAME_ PRICE MPG WEIGHT 1 CORR PRICE 1.00000 -0.46860 0.53861 2 CORR MPG -0.46860 1.00000 -0.80717 3 CORR WEIGHT 0.53861 -0.80717 1.00000

With the dataset **corr**, we can now run **corr2data**. To figure out what
arguments to provide, we can look at the comments explaining the macro.

/****************************************************************** * Name: corr2data * * Function: creating a data set with given correlation matrix * * %corr2data(mydata, corrmat=corr, n=200, full='f', corr='f'); * * corrmat: input matrix * * n: number of observations * * full: specifying if the input matrix is a full matrix * * 'T' for full matrix * * 'F' for upper or lower triangular * * corr: specifying if the input matrix is a correlation * * matrix or a covariance matrix: * * 'T' for correlation matrix and * * 'F' for covariance matrix * *******************************************************************/

We can create a new dataset called **mycorr**, pass the macro our current
correlation matrix **corr**, specify that our new dataset should have 200
observations and that our matrix is a full matrix of
correlations (as opposed to covariances). The code to do this follows:

%corr2data(mycorr, corr, 200, FULL='T', corr='T');

After running the macro, we can look at the correlations in our new dataset **mycorr**.

proc corr data = mycorr; run;The CORR Procedure 3 Variables: COL1 COL2 COL3 Simple Statistics Variable N Mean Std Dev Sum Minimum Maximum COL1 200 0 1.00000 0 -2.45292 2.59209 COL2 200 0 1.00000 0 -3.12547 2.69587 COL3 200 0 1.00000 0 -3.13046 2.70254 Pearson Correlation Coefficients, N = 200 Prob > |r| under H0: Rho=0 COL1 COL2 COL3 COL1 1.00000 -0.46860 0.53861 <.0001 <.0001 COL2 -0.46860 1.00000 -0.80717 <.0001 <.0001 COL3 0.53861 -0.80717 1.00000 <.0001 <.0001

We can see that the correlations here exactly match those from the auto dataset we started with.

## Example 2: Writing correlation matrix to create dataset.

You do not necessarily need to start with an existing dataset to generate a
dataset with a certain correlation structure. Instead, you can write a
correlation matrix in SAS and provide that matrix to the **corr2data** macro.
See the example below.

data corr; input x1 x2; datalines; 1 .24 .24 1 ; run; proc print data = corr; run;Obs x1 x2 1 1.00 0.24 2 0.24 1.00

Now, this correlation matrix can be our **corrmat **argument.

%corr2data(mycorr, corr, 200, FULL='T', corr='T');

We can now look at the correlation matrix of our new dataset to see that it matches the correlation matrix we provided.

proc corr data = mycorr; run;The CORR Procedure 2 Variables: COL1 COL2 Simple Statistics Variable N Mean Std Dev Sum Minimum Maximum COL1 200 0 1.00000 0 -3.15650 2.84648 COL2 200 0 1.00000 0 -2.55106 3.06611 Pearson Correlation Coefficients, N = 200 Prob > |r| under H0: Rho=0 COL1 COL2 COL1 1.00000 0.24000 0.0006 COL2 0.24000 1.00000 0.0006