SAS has the ability to read raw data directly from FTP servers. Normally, you would use
FTP to download the data to your local computer and then
use SAS to read the data stored on your local computer. SAS allows you to bypass the FTP
step and read the data directly from the other computer via FTP without the intermediate
step of downloading the raw data file to your computer. Of course, this assumes that you
can reach the computer via the internet at the time you run your SAS program. The program below
illustrates how to do this. After the filename in you put
ftp
to tell SAS to access the data via FTP. After that, you supply the name of the file (in this
case ‘gpa.txt’. lrecl= is used to specify the width of
your data. Be sure to choose a value that is at least as wide as your widest
record. cd= is used to specify the directory from where the file is stored.
host= is used to specify the name of the site to which you want to FTP.
user= is used to provide your userid (or anonymous if
connecting via anonymous FTP). pass= is used to supply your
password (or your email address if connecting via anonymous FTP).
FILENAME in FTP 'gpa.txt' LRECL=80 CD='/local2/samples/sas/ats/' HOST='cluster.oac.ucla.edu' USER='joebruin' PASS='yourpassword' ; DATA gpa ; INFILE in ; INPUT gpa hsm hss hse satm satv gender ; RUN; PROC PRINT DATA=gpa(obs=10) ; RUN;
As you see below, the program read the data in gpa.txt successfully
OBS GPA HSM HSS HSE SATM SATV GENDER 1 5.32 10 10 10 670 600 1 2 5.14 9 9 10 630 700 2 3 3.84 9 6 6 610 390 1 4 5.34 10 9 9 570 530 2 5 4.26 6 8 5 700 640 1 6 4.35 8 6 8 640 530 1 7 5.33 9 7 9 630 560 2 8 4.85 10 8 8 610 460 2 9 4.76 10 10 10 570 570 2 10 5.72 7 8 7 550 500 1
The log shows that we read 40 records and 7 variables, confirming that we read the data correctly. Since it is possible you could lose your FTP connection and only get part of the data, it is extra important to check the log to see how many observations and variables you read, and to compare that to how many observations and variables you believe the file to have.
NOTE: 40 records were read from the infile IN. The minimum record length was 25. The maximum record length was 25. NOTE: The data set WORK.GPA has 40 observations and 7 variables.
In your program, be sure to change the lrecl=80
to be the width of your raw data file. If you are unsure of how wide the file is, just use
a value that is certainly wider than the widest line of your file. You would most likely
use this technique when you are reading a very large file. You can test your program by
just reading a handful of observations by using the obs= parameter on the
infile statement, e.g., infile in obs=20;
would read just the first 20 observations from your file.