This page was adapted from a page created by Oliver Schabenberger. We thank Professor Schabenberger for permission to adapt and distribute this page via our web site.
1. Basic rules
2. SAS programs
3. Options
4. Titles
5. Comments
- Every SAS statement ends with a semicolon (forgetting the semicolon is cause of many headaches among novice users)
- SAS statements can extend over multiple lines provided you do not split a word of the statement across lines
- Lines of data are not ended by semicolons (data lines are not considered statements)
- More than one statement can appear on a single line
- SAS is case insensitive, you can use lowercase, uppercase, or a mixture. This does not apply for data lines. When you read values of character variables, SAS distinguishes between upper- and lowercase characters.
- Words in SAS statements are separated by blanks
- Although it is good style to indent certain statements, you can start statements anywhere within a line
SAS programs consist of two steps, the DATA step and the PROC step. The DATA step reads data and prepares it for use by subsequent DATA or PROC steps. A SAS procedure (PROC) is a collection of statements that execute a certain task. SAS procedures have their own statements and commands, but many are shared among procedures. For example, most procedures have a BY statement that functions similarly. All procedures for regression, ANOVA, etc. have a MODEL statement. DATA steps begin with the word DATA and are followed by a name (up to eight characters). For example
DATA MYDATA;
<statements and/or data>
RUN;
indicates to SAS the begin of a data step in which a SAS data set is being created. The data set is assigned the name MYDATA. Subsequent DATA and PROC steps can access the data under its name, MYDATA. The RUN; statement indicates the end of the data step. It is good style to end all DATA and PROC steps with a RUN statement. This aids readability of the program. If a procedure is invoked without specifically indicating which data set is to be used, SAS defaults to using the last successfully created data set. A program such as the following
DATA MYDATA;
<statements and/or data>
RUN;
DATA NEXTDATA;
<statements and/or data>
RUN;
PROC PRINT;
RUN;
will create a data set called MYDATA, then create a data set called NEXTDATA and finally invoke the PRINT procedure which displays the contents of a data set in the OUT window (PROC PRINT does not send the contents of the data set to the printer). PROC PRINT will print the contents of NEXTDATA, since this data set was created last. If you wish to print the contents of MYDATA, you can either move the PROC PRINT statements before the DATA NEXTDATA; statement such as in this code
DATA MYDATA;
<statements and/or data>
RUN;
PROC PRINT; RUN;
DATA NEXTDATA;
<statements and/or data>
RUN;
or indicate to the PRINT procedure which data set to display:
DATA MYDATA;
<statements and/or data>
RUN;
DATA NEXTDATA;
<statements and/or data>
RUN;
PROC PRINT DATA=MYDATA;
RUN;
The DATA= part is called an option of the PROC PRINT statement. All procedures have a DATA= option. It is good style to explicitly state in each procedure call which data set the procedure is supposed to use. This helps a great deal reading and debugging programs.
Options are settings that influence the particular SAS session and are issued through the OPTIONS statement. This statement appears outside DATA and PROC steps. Typical options are
center and nocenter /* determines whether
output is centered or left justified */
ls= (or linesize=) /* determines
the number of columns in the OUT window */
ps= (or pagesize=) /* determines the number
of lines on a page in the OUT window */
date and nodate /* whether SAS
includes the date in the header of each page or not*/
number and nonumber /* whether the page header
contains a page number */
Options are best set at the very top of your SAS program, since they usually are supposed to affect the session. For example,
options ls=84 ps=64 nocenter;
defines a page as 84 columns wide and 64 lines long. Output will be displayed left justified.
Titles are descriptive headers SAS places at the top of each page of the OUT window. A title is set with the TITLE statement followed by a string of character. The string must be enclosed in single or double quotes. If you open the string with single quotes, you must close it with single quotes and respectively for double quotes. For example
title 'Contents of data set MYDATA';
will define the string as a title for output pages. If you want multiple line titles you can use the TITLE statement where the word title is followed by a number:
title 'Contents of data set MYDATA';
title2 'The data were collected in July 1998 in a total of ';
title3 '100 man hours of field measurements and 40 lab hours';
To clear the title setting simply execute
title;
A very common mistake is to forget to close a string that was opened with a single or double quote or to close it with a quote of the other kind. For example the code
title 'This is my data;
data mydata;
input a b c;
datalines;
1 2 3
4 5 6
;
run;
will produce a warning message in the LOG window:
WARNING: The current word or quoted string has become more than 200
characters long. You may have
unbalanced quotation marks.
NOTE: A single quote (') will terminate this quoted string.
The maximum length for a string is 200 characters. SAS continued to declare the characters data mydata; input … as part of the title until it filled the string. The only remedy of this problem is to execute the single quote
';
to close the string. Sometimes nothing seems to fix the problem but to close SAS and to restart the application.
If your title contains single quotes, you have to enclose it in double quotes, e.g.,
title "This is John Smith's data";
It is good style to comment your programs. Nothing is more frustrating as to return to a program written months ago and not being able to figure out what you did and why you did it. Also, if someone else needs to look over your program, the task is made much easier, if the code is commented. There are two types of comments in SAS
The line comment begins with an asterisk and continues until SAS encounters a semicolon. For example
* I am now creating my data set MYDATA;
data mydata;
input a b c;
< and so forth>
SAS will ignore the line commencing with * during execution. Although called a line comment, this comment can extend over multiple lines. Line comments are very useful to quickly remove a single line from a program without physically deleting it if you want to find out how a program behaves if a certain statement is removed.
The other type of comment – which you will see more frequently – is commenced with the symbol combination
/*
and continues until the symbol combination
*/
is encountered. Make sure there are no blanks between the slash and asterisk. All statements and code between /* and */ are ignored upon program execution. This is a very convenient way to eliminate an entire DATA step or procedure from execution.