1.0 SAS statements and procs in this unit
data | Begins a data step which manipulates datasets |
infile | Identifies an external raw data file to read |
input | Lists variable names in the input file |
datalines | Indicates internal data |
set | Reads a SAS data set |
proc contents | Contents of a data set |
proc print | Prints observations of variables in a data set |
libname | An engine to connect to Microsoft files |
2.0 Demonstration and explanation
2.1 Import wizard, libnames and proc import
We will start with inputting an Excel file into SAS first through the SAS Import Wizard.
- File Menu
- Import Data
- Select Microsoft Excel Workbook(*.xls *.xlsb *.xlsm *.xlsx) as the source (this is the default)
- Click on Next
- Click on Browse to select a file
- Navigate to c:\sas_data\hs0.xlsx and hit OK
- Select which table(sheet) to use
- Click on Options
- Make sure that “Use data in the first row as SAS variable names” is checked.
- Click on OK
- Click on Next
- Click on Options
- Choose a library and enter a name for the dataset (hs0) under Member:
- Click on Finish
- Optionally, to save this procedure as proc import code, click on Next, enter a file in which to save this code, and click on Finish.
- Click on Finish
Below is the SAS syntax to import the same excel file.
proc import out= work.hs0 datafile= "c:\sas_data\hs0.xlsx" dbms=excel replace; range="hs0$"; getnames=yes; run;
Both of the methods above (menus or syntax) work for other file formats, such as .xls, comma-separated, or tab-delimited files, and even Stata or SPSS datasets. Now we can look at the data or even modify them if we want.
- Explorer tab
- Libraries
- Work
- Double click on hs0
- Edit Menu
- Select Edit Mode
- Click on data to edit
2.2 Data Steps
One of the more commonly used ASCII data formats is the comma-separated-values (.csv) format. Files of this type can be read in through the Import Wizard or proc import as shown above or through a little bit of programming. We will now show how to read in a .csv file with a SAS data step. The following segment is the beginning part of the hs0 file in .csv format. This data file doesn’t have variable names on the first line. Also notice that the line in bold has two consecutive commas near the end. This means that there is a missing value in between. In order to read in the data correctly, we use the option dsd in the infile statement.
0,70,4,1,1,general,57,52,41,47,57 1,121,4,2,1,vocational,68,59,53,63,61 0,86,4,3,1,general,44,33,54,58,31 0,141,4,3,1,vocational,63,44,47,53,56 0,172,4,2,1,academic,47,52,57,53,61 0,113,4,2,1,academic,44,52,51,63,61 0,50,3,2,1,general,50,59,42,53,61 0,11,1,2,1,academic,34,46,45,39,36 0,84,4,2,1,general,63,57,54,,51 0,48,3,2,1,academic,57,55,52,50,51 0,75,4,2,1,vocational,60,46,51,53,61 0,60,5,2,1,academic,57,65,51,63,61
The following data step will read the data file and name it temp. The infile statement tells SAS where the location and the name of the ASCII file is. The input statement gives the names of the variables in the dataset in the same order as the comma separated file. The $ after prgtype tells SAS that prgtype is a character variable, that is, a variable that can contain letters as well as numbers. The length statement tells SAS that the variable prgtype is a character variable (as in the input statement, the $ indicates a string variable) and has ten characters (indicated by the 10 following the $). By default, SAS allows a character variable to be 8 or fewer characters. If the character variable is to be longer, you have to tell SAS using the length statement. Note that if you have already specified that the variable is a character variable in the length statement, it is not necessary to include the $ after prgtype in the input statement; however, doing so is not problematic. The option dsd stands for delimiter sensitive data. This option will tell SAS to ignore commas within quoted strings, treat consecutive delimiters as missing data, and set the delimiter to comma.
data temp; infile 'c:\sas_data\hs0.csv' delimiter=',' dsd; length prgtype $10; input gender id race ses schtyp prgtype $ read write math science socst ; run;
Once we have entered the data, we can list the first ten observations to check that the inputting was successful. Note that proc print “prints” the data to the output window, not to a physical printer.
proc print data = temp (obs=10); run;
Sometimes we may want to input data directly from within SAS and here is what to do.
data hsb10; input id female race ses schtype $ prog read write math science socst; datalines; 147 1 1 3 pub 1 47 62 53 53 61 108 0 1 2 pub 2 34 33 41 36 36 18 0 3 2 pub 3 50 33 49 44 36 153 0 1 2 pub 3 39 31 40 39 51 50 0 2 2 pub 2 50 59 42 53 61 51 1 2 1 pub 2 42 36 42 31 39 102 0 1 1 pub 1 52 41 51 53 56 57 1 1 2 pub 1 71 65 72 66 56 160 1 1 2 pub 1 55 65 55 50 61 136 0 1 2 pub 1 65 59 70 63 51 ; run; proc print data=hsb10; run;
2.3 Saving SAS data files
So far, all the SAS data sets that we have created are temporary. When we quit SAS, all temporary data sets will be gone. To save a SAS data file to disk we can use a data step. The example below saves the dataset temp from above as c:\sas_data\hs0 (SAS will automatically add the file extension .sas7bdat to the file name hs0).
data 'c:\sas_data\hs0'; set temp; run;
Alternatively, we can create a permanent SAS file using the libname statement and a data step. Creating a library allows us to refer to a file in a specific directory (folder) without typing out the full file path. The command libnamecreates a shortcut that refers back to a specified directory. In the example below, we create a library called in that refers to the folder sas_data on the c:\ drive of your work station. Once this statement has been submitted to SAS it will create a directory in Libraries which is under the Explorer tab. Inside this directory you will be able to view any SAS data files that are present inside the sas_data folder on your work station.
libname in 'c:\sas_data';
Once we have set our libname, we can save our permanent SAS data file to it using a data step. Here the data statement specifies the new SAS file we want to create/output. We first give the library it will be housed inside (in) followed by a “.” and then give the data file the name hs0. The set statement specifies the input data file temp, the file that is used to create hs0. The temp SAS file does not need a library extension. Temporary files are automatically saved in the work directory and do not need a library specification in order to be properly referenced.
data in.hs0; set temp; run;
This will save a new SAS file in the library in. Since this library refers to the directory c:/sas_data, you can look in that folder on your computer and you will find a SAS data file name hs0 saved there.
3.0 For more information
- The Little SAS Book, Fifth Edition
- Chapter 2
- Just Enough SAS: A Quick-start Guide to SAS for Engineers
- Chapters 2-3
- Reading External Data Files Using SAS: Examples Handbook
- Chapters 2 and 5