The SAS macro corr2data can be used to generate a dataset of a given size with a given correlation/covariance structure. This can be a very useful step in a simulation process. The macro program can be found here.
If you have already downloaded the macro, you can paste the code into the program editor or, alternatively, use %include.
Example 1: Using a correlation matrix from an existing dataset.
First, We have a dataset, https://stats.idre.ucla.edu/wp-content/uploads/2016/02/auto.sas7bdat, from which we will calculate a correlation matrix and then, using the corr2data macro, generate a new dataset with the same correlation structure. To use the macro, we need to generate and save the correlation matrix. Let’s look at the correlations between the variables price, mpg, and weight.
proc corr data = auto outp=p nosimple noprob; var price mpg weight; run; data corr; set p; if _type_="CORR"; run; proc print data = corr; run; Obs _TYPE_ _NAME_ PRICE MPG WEIGHT 1 CORR PRICE 1.00000 -0.46860 0.53861 2 CORR MPG -0.46860 1.00000 -0.80717 3 CORR WEIGHT 0.53861 -0.80717 1.00000
With the dataset corr, we can now run corr2data. To figure out what arguments to provide, we can look at the comments explaining the macro.
/****************************************************************** * Name: corr2data * * Function: creating a data set with given correlation matrix * * %corr2data(mydata, corrmat=corr, n=200, full='f', corr='f'); * * corrmat: input matrix * * n: number of observations * * full: specifying if the input matrix is a full matrix * * 'T' for full matrix * * 'F' for upper or lower triangular * * corr: specifying if the input matrix is a correlation * * matrix or a covariance matrix: * * 'T' for correlation matrix and * * 'F' for covariance matrix * *******************************************************************/
We can create a new dataset called mycorr, pass the macro our current correlation matrix corr, specify that our new dataset should have 200 observations and that our matrix is a full matrix of correlations (as opposed to covariances). The code to do this follows:
%corr2data(mycorr, corr, 200, FULL='T', corr='T');
After running the macro, we can look at the correlations in our new dataset mycorr.
proc corr data = mycorr; run; The CORR Procedure 3 Variables: COL1 COL2 COL3 Simple Statistics Variable N Mean Std Dev Sum Minimum Maximum COL1 200 0 1.00000 0 -2.45292 2.59209 COL2 200 0 1.00000 0 -3.12547 2.69587 COL3 200 0 1.00000 0 -3.13046 2.70254 Pearson Correlation Coefficients, N = 200 Prob > |r| under H0: Rho=0 COL1 COL2 COL3 COL1 1.00000 -0.46860 0.53861 <.0001 <.0001 COL2 -0.46860 1.00000 -0.80717 <.0001 <.0001 COL3 0.53861 -0.80717 1.00000 <.0001 <.0001
We can see that the correlations here exactly match those from the auto dataset we started with.
Example 2: Writing correlation matrix to create dataset.
You do not necessarily need to start with an existing dataset to generate a dataset with a certain correlation structure. Instead, you can write a correlation matrix in SAS and provide that matrix to the corr2data macro. See the example below.
data corr; input x1 x2; datalines; 1 .24 .24 1 ; run; proc print data = corr; run; Obs x1 x2 1 1.00 0.24 2 0.24 1.00
Now, this correlation matrix can be our corrmat argument.
%corr2data(mycorr, corr, 200, FULL='T', corr='T');
We can now look at the correlation matrix of our new dataset to see that it matches the correlation matrix we provided.
proc corr data = mycorr; run; The CORR Procedure 2 Variables: COL1 COL2 Simple Statistics Variable N Mean Std Dev Sum Minimum Maximum COL1 200 0 1.00000 0 -3.15650 2.84648 COL2 200 0 1.00000 0 -2.55106 3.06611 Pearson Correlation Coefficients, N = 200 Prob > |r| under H0: Rho=0 COL1 COL2 COL1 1.00000 0.24000 0.0006 COL2 0.24000 1.00000 0.0006