This page was adapted from a FAQ (FAQ #92) developed by The University of Texas at Austin Statistical Services, and thank them for permission to use their materials in developing our FAQs for our web site.
There are a large number of options that you can use on the infile statement. This is a brief summary of commonly used options. You can determine which options you may need by examining your raw data file e.g., in Notepad, Wordpad, using more (on UNIX) or any other command that allows you to view your data.
Let’s start with a simple example reading the space delimited file shown below.
22 2930 4099 17 3350 4749 22 2640 3799 20 3250 4816 15 4080 7827
The example program shows how to read the space delimited file shown above.
DATA cars; INFILE 'space1.txt' ; INPUT mpg weight price; RUN; PROC PRINT DATA=cars; RUN;
As you can see in the output below, the data was read
properly.
OBS MPG WEIGHT PRICE 1 22 2930 4099 2 17 3350 4749 3 22 2640 3799 4 20 3250 4816 5 15 4080 7827
Infile options
For more complicated file layouts, refer to the infile options described below.
DLM=
The dlm= option can be used to specify the delimiter that
separates the variables in your raw data file. For example, dlm=’,’indicates
a comma is the delimiter (e.g., a comma separated file, .csv file). Or, dlm=’09’x indicates that tabs are used to separate your variables (e.g., a
tab separated file).
DSD
The dsd option has 2 functions.
First, it recognizes two consecutive delimiters as a missing
value. For example, if your file contained the line 20,30,,50 SAS will treat
this as 20 30 50 but with the dsd option SAS will treat it as
20 30 .
50 , which is probably what you intended.
Second, it allows you to include the delimiter within quoted
strings. For example, you would want to use the dsd option if you had a comma
separated file and your data included values like "George Bush, Jr.". With
the dsd option, SAS will recognize that the comma in "George Bush, Jr."
is part of the name, and not a separator indicating a new variable.
FIRSTOBS=
This option tells SAS what on what line you want it to start
reading your raw data file. If the first record(s) contains header information such
as variable names, then set firstobs=n where n is the record number where
the data actually begin. For example, if you are reading a comma separated file or a tab
separated file that has the variable names on the first line, then use firstobs=2 to tell SAS to begin reading at the second line (so it will
ignore the first line with the names of the variables).
MISSOVER
This option prevents SAS from going to a new input line if it
does not find values for all of the variables in the current line of data. For
example, you may be reading a space delimited file and that is supposed to have 10 values
per line, but one of the line had only 9 values. Without the missover option,
SAS will look for the 10th value on the next line of data. If your data is supposed
to only have one observation for each line of raw data, then this could cause errors
throughout the rest of your data file. If you have a raw data file that has
one record per line, this option is a prudent method of trying to keep such errors from
cascading through the rest of your data file.
OBS=
Indicates which line in your raw data file should be treated as
the last record to be read by SAS. This is a good option to use for testing your
program. For example, you might use obs=100 to just read in the first 100
lines of data while you are testing your program. When you want to read the entire
file, you can remove the obs= option entirely.
A typical infile statement for reading a comma delimited file that contains the variable names in the first line of data would be:
INFILE "test.txt" DLM=',' DSD MISSOVER FIRSTOBS=2 ;