SAS proc import is usually a good starting point for reading a delimited ASCII data file, such as a .csv (comma-separated values) file or a tab-delimited file. Sometimes we can also use a data step to read in an ASCII data file. On this page, we will show examples on how to read delimited ASCII files using proc import and data step.
Comma-separated files
It is a common practice to read in comma-separated files. SAS proc import is usually sufficient for this purpose. There are two slightly different ways of reading a comma delimited file using proc import. In SAS, a comma delimited file can be considered as a special type of external file with special file extension .csv, which stands for comma-separated-values. We show here a sample program making use of this feature using data found in https://stats.idre.ucla.edu/wp-content/uploads/2016/02/cars_novname.csv
Acura,MDX,SUV,Asia,All,"$36,945 ","$33,337 ",3.5,6,265,17,23,4451,106,189 Acura,RSX Type S 2dr,Sedan,Asia,Front,"$23,820 ","$21,761 ",2,4,200,24,31,2778,101,172 Acura,TSX 4dr,Sedan,Asia,Front,"$26,990 ","$24,647 ",2.4,4,200,22,29,3230,105,183 Acura,TL 4dr,Sedan,Asia,Front,"$33,195 ","$30,299 ",3.2,6,270,20,28,3575,108,186 Acura,3.5 RL 4dr,Sedan,Asia,Front,"$43,755 ","$39,014 ",3.5,6,225,18,24,3880,115,197
Then the following proc import statement will read it in and create a temporary data set called mydata. Since the .csv file does not include variable names, we indicate getnames = no. After reading in the data, we can look at a proc contents to see the formats used in our dataset.
proc import datafile="https://stats.idre.ucla.edu/wp-content/uploads/2016/02/cars_novname.csv" out=mydata dbms=csv replace; getnames=no; run; proc contents data=mydata; run; # Variable Type Len Format Informat 1 VAR1 Char 5 $5. $5. 2 VAR2 Char 39 $39. $39. 3 VAR3 Char 7 $7. $7. 4 VAR4 Char 7 $7. $7. 5 VAR5 Char 6 $6. $6. 6 VAR6 Char 10 $10. $10. 7 VAR7 Char 10 $10. $10. 8 VAR8 Num 8 BEST12. BEST32. 9 VAR9 Num 8 BEST12. BEST32. 10 VAR10 Num 8 BEST12. BEST32. 11 VAR11 Num 8 BEST12. BEST32. 12 VAR12 Num 8 BEST12. BEST32. 13 VAR13 Num 8 BEST12. BEST32. 14 VAR14 Num 8 BEST12. BEST32. 15 VAR15 Num 8 BEST12. BEST32.
As you can see in the output above, SAS creates default variable names as VAR1-VARn when variables names are not present in the raw data file.
You might have a file where you have the names at the top of the file like https://stats.idre.ucla.edu/wp-content/uploads/2016/02/cars.csv. With such a file you would like SAS to use the variable names from the file (e.g., make mpg etc.).
Make,Model,Type,Origin,DriveTrain,MSRP,Invoice,EngineSize,Cylinders,Horsepower,MPG_City,MPG_Highway,Weight,Wheelbase,Length Acura,MDX,SUV,Asia,All,"$36,945 ","$33,337 ",3.5,6,265,17,23,4451,106,189 Acura,RSX Type S 2dr,Sedan,Asia,Front,"$23,820 ","$21,761 ",2,4,200,24,31,2778,101,172 Acura,TSX 4dr,Sedan,Asia,Front,"$26,990 ","$24,647 ",2.4,4,200,22,29,3230,105,183 Acura,TL 4dr,Sedan,Asia,Front,"$33,195 ","$30,299 ",3.2,6,270,20,28,3575,108,186
We can use the getnames=yes; statement to tell SAS we want it to read the variable names from the first line of the data file, as illustrated below.
proc import datafile="https://stats.idre.ucla.edu/wp-content/uploads/2016/02/cars.csv" out=mydata dbms=csv replace; getnames=yes; run;
Another way of reading a comma delimited file is to consider a comma as an ordinary delimiter. Here is a program that shows how to use the dbms=dlm and delimiter=”,” option to read a file just like we did above. Also notice that the external file doesn’t have to have .csv extension.
proc import datafile="https://stats.idre.ucla.edu/wp-content/uploads/2016/02/cars.csv" out=mydata dbms=dlm replace; delimiter=","; getnames=yes; run;
Tab-delimited files
The options available for reading in a .csv file in proc import also exist for tab-delimited files: you can opt to read in or not read in names from your file; you can treat tab-delimited files as a special type of external file with extension .txt of your can treat your file as an instance of a delimited file and describe the delimiter.
We first show sample a program for reading in a tab delimited file that does not include variable names. SAS creates default variable names as VAR1-VARn in when we do not provide variable names.
proc import datafile="cars.txt" out=mydata dbms=tab replace; getnames=no; run;
Next, if your tab-delimited file contains variable names, you change the getnames option.
proc import datafile="cars.txt" out=mydata dbms=tab replace; getnames=yes; run;
Or if you wish to treat your tab-delimited file with variable names as a generic “delimited” file, you change the dbms option and indicate what the delimiter is.
proc import datafile="cars.txt" out=mydata dbms=dlm replace; delimiter='09'x; getnames=yes; run;
You may want to create a permanent SAS data file using proc import. Suppose that we want to create a permanent SAS data file called mydata in the directory “c:dissertation”. We can do the following.
libname dis "c:dissertation"; proc import datafile="cars.txt" out=dis.mydata dbms=dlm replace; delimiter='09'x; getnames=yes; run;
Space-delimited files
Proc import also work for space-delimited files. When dbms = dlm is indicated, SAS assumes the file is space-delimited unless otherwise specified (as we did above for commas and tabs).
We first show sample a program for reading in a space-delimited file that does not include variable names. SAS creates default variable names as VAR1-VARn in when we do not provide variable names.
proc import datafile="cars_sp.txt" out=mydata dbms=dlm replace; getnames=no; run;
Next, if your space-delimited file contains variable names, you change the getnames option.
proc import datafile="cars_sp.txt" out=mydata dbms=dlm replace; getnames=yes; run;
If you wish to create a permanent SAS data file using proc import in the directory “c:dissertation”, you can use the code below.
libname dis "c:dissertation"; proc import datafile="cars_sp.txt" out=dis.mydata dbms=dlm replace; getnames=yes; run;
Other kinds of delimiters
You can use delimiter= on the infile statement to tell SAS what delimiter you are using to separate variables in your raw data file. For example, below we have a raw data file that uses exclamation points ! to separate the variables in the file.
22!2930!4099 17!3350!4749 22!2640!3799 20!3250!4816 15!4080!7827
The example below shows how to read this file by using delimiter=’!’ on the infile statement.
DATA cars; INFILE 'readdel1.txt' DELIMITER='!' ; INPUT mpg weight price; RUN; PROC PRINT DATA=cars; RUN;
As you can see in the output below, the data was read properly.
OBS MPG WEIGHT PRICE 1 22 2930 4099 2 17 3350 4749 3 22 2640 3799 4 20 3250 4816 5 15 4080 7827
It is possible to use multiple delimiters. The example file below uses either exclamation points or plus signs as delimiters.
22!2930!4099 17+3350+4749 22!2640!3799 20+3250+4816 15+4080!7827
By using delimiter=’!+’ on the infile statement, SAS will recognize both of these as valid delimiters.
DATA cars; INFILE 'readdel2.txt' DELIMITER='!+' ; INPUT mpg weight price; RUN; PROC PRINT DATA=cars; RUN;
As you can see in the output below, the data was read properly.
OBS MPG WEIGHT PRICE 1 22 2930 4099 2 17 3350 4749 3 22 2640 3799 4 20 3250 4816 5 15 4080 7827
Problem with Proc Import
Proc import does not know the formats for your variables, but it is able to guess the format based on what the beginning of your dataset looks like. Most of the time, this guess is fine. But if the length of a variable differs from beginning to end of your file, you might end up with some truncated values.
Let’s return to the first example shown on this page. We read in a .csv file and then looked at the contents of the SAS dataset we generated. The format assumed for the first variable was a character string of length 5. This is consistent with the values seen at the beginning of this variable listing the makes of cars, sorted alphabetically (“Acura”, “Audi”, “BMW”). However, if we look at a frequency table of this variable, we can see that there are longer values that are being truncated.
Cumulative Cumulative VAR1 Frequency Percent Frequency Percent Acura 7 1.64 7 1.64 Audi 19 4.44 26 6.07 BMW 20 4.67 46 10.75 Buick 9 2.10 55 12.85 Cadil 8 1.87 63 14.72 Chevr 27 6.31 90 21.03 Chrys 15 3.50 105 24.53 Dodge 13 3.04 118 27.57 Ford 23 5.37 141 32.94 GMC 8 1.87 149 34.81 Honda 17 3.97 166 38.79 Humme 1 0.23 167 39.02 Hyund 12 2.80 179 41.82 Infin 8 1.87 187 43.69 Isuzu 2 0.47 189 44.16 Jagua 12 2.80 201 46.96 Jeep 3 0.70 204 47.66 Kia 11 2.57 215 50.23 Land 3 0.70 218 50.93 Lexus 11 2.57 229 53.50 Linco 9 2.10 238 55.61 MINI 2 0.47 240 56.07 Mazda 11 2.57 251 58.64 Merce 26 6.07 277 64.72 Mercu 9 2.10 286 66.82 Mitsu 13 3.04 299 69.86 Nissa 17 3.97 316 73.83 Oldsm 3 0.70 319 74.53 Ponti 11 2.57 330 77.10 Porsc 7 1.64 337 78.74 Saab 7 1.64 344 80.37 Satur 8 1.87 352 82.24 Scion 2 0.47 354 82.71 Subar 11 2.57 365 85.28 Suzuk 8 1.87 373 87.15 Toyot 28 6.54 401 93.69 Volks 15 3.50 416 97.20 Volvo 12 2.80 428 100.00
This may not be an enormous problem here–we can still understand the variable, even in its truncated form–but if there were different values that began with the same 5 characters, we would no longer se them as different values (e.g., “South Carolina” and “South Dakota” would both be “South”). How can you fix this?
When you run proc import, code for a data step is generated behind the scenes and that is what is actually run to read in your data. This data step can be found in the SAS log after running proc import and can be copied, amended, and rerun. We’ve pasted the data step from the log and marked in bold the lines we would amend to fix this problem variable. The “$5.” would be replaced with something safer, like “$20.”.
data WORK.MYDATA ; %let _EFIERR_ = 0; /* set the ERROR detection macro variable */ infile 'C:cars_novname.csv' delimiter = ',' MISSOVER DSD lrecl=32767 ; informat VAR1 $5. ; informat VAR2 $39. ; informat VAR3 $7. ; informat VAR4 $7. ; informat VAR5 $6. ; informat VAR6 $10. ; informat VAR7 $10. ; informat VAR8 best32. ; informat VAR9 best32. ; informat VAR10 best32. ; informat VAR11 best32. ; informat VAR12 best32. ; informat VAR13 best32. ; informat VAR14 best32. ; informat VAR15 best32. ; format VAR1 $5. ; format VAR2 $39. ; format VAR3 $7. ; format VAR4 $7. ; format VAR5 $6. ; format VAR6 $10. ; format VAR7 $10. ; format VAR8 best12. ; format VAR9 best12. ; format VAR10 best12. ; format VAR11 best12. ; format VAR12 best12. ; format VAR13 best12. ; format VAR14 best12. ; format VAR15 best12. ; input VAR1 $ VAR2 $ VAR3 $ VAR4 $ VAR5 $ VAR6 $ VAR7 $ VAR8 VAR9 VAR10 VAR11 VAR12 VAR13 VAR14 VAR15 ; if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */ run;