It is very convenient to read comma delimited, tab delimited, or other kinds of delimited raw data files. However, you need to be very careful when reading delimited data with missing values. Consider the example raw data file below. Note that the value of mpg is missing for the AMC Pacer and the missing value is signified with two consecutive commas (,,).
AMC Concord,22,2930,4099 AMC Pacer,,3350,4749 AMC Spirit,22,2640,3799 Buick Century,20,3250,4816 Buick Electra,15,4080,7827
We read the file using the program below using delimiter=’,’ to indicate that commas are used as delimiters.
DATA cars1; length make $ 20 ; INFILE 'readdsd.txt' DELIMITER=',' ; INPUT make mpg weight price; RUN; PROC PRINT DATA=cars1; RUN;
But, as we see below, the data was read incorrectly for the AMC Pacer.
OBS MAKE MPG WEIGHT PRICE 1 AMC Concord 22 2930 4099 2 AMC Pacer 3350 4749 . 3 Buick Century 20 3250 4816 4 Buick Electra 15 4080 7827
SAS does not properly recognize empty values for delimited data unless you use the dsd option. You need to use the dsd option on the infile statement if two consecutive delimiters are used to indicate missing values (e.g., two consecutive commas, two consecutive tabs). Below, we read the exact same file again, except that we use the dsd option.
DATA cars2; length make $ 20 ; INFILE 'readdsd.txt' DELIMITER=',' DSD ; INPUT make mpg weight price; RUN; PROC PRINT DATA=cars2; RUN;
The output is shown below.
OBS MAKE MPG WEIGHT PRICE 1 AMC Concord 22 2930 4099 2 AMC Pacer . 3350 4749 3 AMC Spirit 22 2640 3799 4 Buick Century 20 3250 4816 5 Buick Electra 15 4080 7827
As you see in the output, the data for the AMC Pacer was read correctly because we used the dsd option.