What are some common options for the infile statement in SAS?

This page was adapted from a FAQ (FAQ #92) developed by The University of Texas at Austin Statistical Services, and thank them for permission to use their materials in developing our FAQs for our web site.

There are a large number of options that you can use on the infile statement. This is a brief summary of commonly used options. You can determine which options you may need by examining your raw data file e.g., in Notepad, Wordpad, using more (on UNIX) or any other command that allows you to view your data.

Let’s start with a simple example reading the space delimited file shown below.

22 2930 4099
17 3350 4749
22 2640 3799
20 3250 4816
15 4080 7827

The example program shows how to read the space delimited file shown above.

DATA cars;
  INFILE 'space1.txt' ;
  INPUT mpg weight price;
RUN;

PROC PRINT DATA=cars;
RUN;

As you can see in the output below, the data was read properly.

OBS    MPG    WEIGHT    PRICE
 1      22     2930      4099
 2      17     3350      4749
 3      22     2640      3799
 4      20     3250      4816
 5      15     4080      7827

Infile options

For more complicated file layouts, refer to the infile options described below.

DLM=
The dlm= option can be used to specify the delimiter that separates the variables in your raw data file. For example, dlm=’,’indicates a comma is the delimiter (e.g., a comma separated file, .csv file). Or, dlm=’09’x indicates that tabs are used to separate your variables (e.g., a tab separated file).

DSD
The dsd option has 2 functions. First, it recognizes two consecutive delimiters as a missing value. For example, if your file contained the line 20,30,,50 SAS will treat this as 20 30 50 but with the dsd option SAS will treat it as 20 30 . 50 , which is probably what you intended. Second, it allows you to include the delimiter within quoted strings. For example, you would want to use the dsd option if you had a comma separated file and your data included values like "George Bush, Jr.". With the dsd option, SAS will recognize that the comma in "George Bush, Jr." is part of the name, and not a separator indicating a new variable.

FIRSTOBS=
This option tells SAS what on what line you want it to start reading your raw data file. If the first record(s) contains header information such as variable names, then set firstobs=n where n is the record number where the data actually begin. For example, if you are reading a comma separated file or a tab separated file that has the variable names on the first line, then use firstobs=2 to tell SAS to begin reading at the second line (so it will ignore the first line with the names of the variables).

MISSOVER
This option prevents SAS from going to a new input line if it does not find values for all of the variables in the current line of data. For example, you may be reading a space delimited file and that is supposed to have 10 values per line, but one of the line had only 9 values. Without the missover option, SAS will look for the 10th value on the next line of data. If your data is supposed to only have one observation for each line of raw data, then this could cause errors throughout the rest of your data file. If you have a raw data file that has one record per line, this option is a prudent method of trying to keep such errors from cascading through the rest of your data file.

OBS=
Indicates which line in your raw data file should be treated as the last record to be read by SAS. This is a good option to use for testing your program. For example, you might use obs=100 to just read in the first 100 lines of data while you are testing your program. When you want to read the entire file, you can remove the obs= option entirely.

A typical infile statement for reading a comma delimited file that contains the variable names in the first line of data would be:

INFILE "test.txt" DLM=',' DSD MISSOVER FIRSTOBS=2 ;