1.0 SAS statements and procs in this unit
libname | Set library |
if and where | Conditional statement |
keep | Keeps named variables |
drop | Drops named variables |
set | Reads in named file(s), append |
proc sort | Sorts cases in a dataset |
merge | Merges files |
2.0 Demonstration and explanation
2.1 Creating a library
We will once again begin by setting our library in SAS. The two proc print commands below demonstrate two equivalent ways to reference the data file of interest hs1. Using the file path or the SAS library will produce the same results.
libname in "c:\sas_data\"; proc print data=in.hs1 (obs=10); var write read science; run;proc print data="c:\sas_data\hs1" (obs=10); var write read science; run;
2.2 Selecting cases using if or where statement
Suppose we wish to analyze just a subset of the hs1 data file. In fact, we are studying “good readers” and just want to focus on the students who had a reading score of 60 and higher. Here we show how to create subsets based on the criterion of high vs. low reading scores.
data highread lowread; set in.hs1; if read >=60 then output highread; if read < 60 then output lowread; run; title "high reading scores"; proc means data=highread n mean; var read; run; title "low reading scores"; proc means data=lowread n mean; var read; run; title;* using where statement; data highread; set in.hs0; where read >=60; run;
2.3 Keeping variables
Further suppose that our data file had many variables, say 2000 variables, but we only care about just a handful of them, id, female, read and write. We can subset our data file to keep just those variables as shown below.
data hskept; set highread; keep id female read write; run;
2.4 Dropping variables
Instead of wanting to keep just a handful of variables, we may want to get rid of just a handful of variables in our data file. Below we show how to remove the variables ses and prog from the dataset.
data hsdropped; set highread; drop ses prog; run;
2.5 Appending datasets
In this example we start with two datasets, one for males (called hsmale) and one for the females (called hsfemale). We need to combine these files together to be able to analyze them by gender, as shown below. In this example, we are adding cases, sometimes called “stacking” the data files. We do this by listing both data file names on the set statement in the data step.
title; proc freq data=in.hsmale; tables female; run; proc freq data=in.hsfemale; tables female; run; data in.hsmaster; set in.hsmale in.hsfemale; run; proc ttest data=in.hsmaster; by female; var write; run;
2.6 Merging datasets
Again, we have been given two files. However, in this case, we have a file that has the demographic information (called hsdem) and a file with the test scores (called hstest), and we wish to merge these files together. To merge files together, each file must first be sorted by the same variable and then saved. Both the sorting and the saving can be done with proc sort. The variable used for sorting will also be the variable using to match records from hsdem to records in hstest. Next, a data step with the merge and by statements is used to combine the datasets.
Before we begin, we should look at the data sets.
proc print data=in.hsdem ; run; proc print data=in.hstest ; run;
Next, we will sort the data sets by the variable that identifies a record(s) to be matched between both datasets, in this case, the variable we are using is id. The out= option outputs a temporary sorted version of each dataset.
proc sort data=in.hsdem out=dem; by id; run; proc sort data=in.hstest out=test; by id; run;
Now we can merge using the newly sorted files and look at the resulting data set.
data all; merge dem test; by id; run; proc print data=all; run;
3.0 For more information
- The Little SAS Book, Fifth Edition
- Chapter 3 and 6
- Just Enough SAS: A Quick-start Guide to SAS for Engineers
- Chapter 2
- SAS Learning Modules
- Merging data files via data step, or proc SQL
- Concatenating (stacking) SAS data files
- SAS Frequently Asked Questions
- SAS Library- Web Page Resources