This module introduces the reading of raw data files into SPSS. It does not show how to read all possible data formats, but aims to show how to read many common file formats
1. Comma delimited data, inline
The example below illustrates how you can read comma delimited data inline. Inline data refers to data that is included as part of a syntax program between begin data and end data commands. This is a very convenient way to read data when your data file is small and/or when you want to be sure to keep your data and the program for reading your data together.
The data list command indicates that we are going to read a data file. The keyword list (the second one) indicates that there is one observation per line of data and that each variable is separated by spaces and/or commas. Following the / is the list of the variable names. Since make is a string (alphanumeric) variable, we use the (A15) to tell SPSS that make is a string variable (with a length of 15). The data appear between the begin data and end data commands, and the list command is used to see if the data were read correctly.
DATA LIST LIST / make (A15) mpg weight price . BEGIN DATA. "AMC Concord",22,2930,4099 "AMC Pacer",17,3350,4749 "AMC Spirit",22,2640,3799 "Buick Century",20, 3250,4816 "Buick Electra",15,4080,7827 END DATA. LIST.
From the output below, it looks like the data were read properly. You might wonder if the quotations marks around the makes of car were needed. SPSS treats commas and spaces as valid delimiters (separators) in this mode of input, so the quotes are necessary. Otherwise, SPSS would think that AMC was the make and Concord was the mpg for the first observation.
MAKE MPG WEIGHT PRICE AMC Concord 22.00 2930.00 4099.00 AMC Pacer 17.00 3350.00 4749.00 AMC Spirit 22.00 2640.00 3799.00 Buick Century 20.00 3250.00 4816.00 Buick Electra 15.00 4080.00 7827.00
2. Space delimited data, inline
Reading space delimited data is basically the same as reading a comma delimited file. As you see below, the program is identical to the one above, but the data are separated by one or more spaces instead of commas. We will omit showing the output from this command.
DATA LIST LIST / make (A15) mpg weight price . BEGIN DATA. "AMC Concord" 22 2930 4099 "AMC Pacer" 17 3350 4749 "AMC Spirit" 22 2640 3799 "Buick Century" 20 3250 4816 "Buick Electra" 15 4080 7827 END DATA. LIST.
3. Comma delimited data from an external file
When you have a larger file, it is often more convenient to read the data from an external data file. The syntax of reading data from an external data file is not much more complicated than reading from inline data. The example below shows that you use file="auto1.txt" (in this example) to tell SPSS that you want to read the data from the file called auto1.txt. When you run this, change auto1.txt to be the name of your raw data file (for example, "c:mydata.txt"). We will omit showing the output of these commands.
DATA LIST LIST FILE="auto1.txt" / make (A15) mpg weight price. LIST.
4. Space delimited data from an external file
As we saw with the inline data, reading a space delimited data file is the same as reading a comma delimited file. As we would expect, the commands below are identical to the ones above, except that it reads from auto2.txt, which is a space delimited file.
DATA LIST LIST FILE="auto2.txt" / make (A15) mpg weight price. LIST.
5. Fixed format data from an external file
Let’s look at how to read a fixed format external data file. Consider the data file shown below.
AMC Concord 22 2930 4099 AMC Pacer 17 3350 4749 AMC Spirit 22 2640 3799 Buick Century 20 3250 4816 Buick Electra 15 4080 7827
The data are in fixed format because the variables occupy the same column positions throughout the entire data file. For example, make is always in columns 1-13, mpg is always in columns 15-16, weight is always in columns 18-12, and price is always in columns 23-26.
Assume that these data were stored in a file called auto3.txt. The commands below show how you can read this data as a fixed format.
DATA LIST FIXED FILE="auto3.txt" / make 1-13 (A) mpg 15-16 weight 18-21 price 23-26. LIST.
The output of this is shown below. The results suggest that the data were read successfully.
MAKE MPG WEIGHT PRICE AMC Concord 22 2930 4099 AMC Pacer 17 3350 4749 AMC Spirit 22 2640 3799 Buick Century 20 3250 4816 Buick Electra 15 4080 7827
Note that the output from list displayed all of the numeric variables with no decimal places. This is different from the prior examples (see the first example on this page). When you read fixed format data in the manner that we did, SPSS associates display formats with the variables that display the data with 0 decimal places. If any of these variables had contained data after a decimal point (e.g., if weight was 22.25), SPSS would not display the part after the decimal point. This might fool you into thinking that there was a problem with reading in the data, when actually there is a problem in displaying the data. If we assume that mpg had two decimal places, we could tell SPSS to display mpg with two decimal places with the formats command.
FORMATS mpg (F5.2).
If you read fixed column data with variables with data after the decimal, be sure to use the formats command (or an equivalent) to make sure that the places after the decimal point will be displayed.
6. Tab delimited files and Excel files
SPSS supports reading a variety of file formats using the get translate command, including tab delimited files, Excel files, Lotus 1-2-3 files and dBase files.
Consider the tab delimited file below. The first line of the file contains the variable names, and the following lines contain the data for the variables. This is a common file format that would be written from a spreadsheet.
make mpg weight price AMC Concord 22 2930 4099 AMC Pacer 17 3350 4749 AMC Spirit 22 2640 3799 Buick Century 20 3250 4816 Buick Electra 15 4080 7827
The example below shows how to read this kind of file. The type=tab subcommand tells SPSS that the file is a tab delimited file, and the fieldnames subcommand tells SPSS to expect the names of the variables in the first line of the data. We will omit the output of the commands below.
GET TRANSLATE FILE="auto4.txt" /TYPE=TAB /FIELDNAMES. LIST.
You can read a tab delimited file that does not have the variable names on the first line. In that case, you would omit the fieldnames subcommand, and you would probably want to use the rename variables command to give your variables more meaningful names (SPSS will name the variables var1, var2, var3, etc.). The example below shows the use of the rename variables command to rename the variables from var1 to var4 to make mpg weight and price. We omit the output of the commands below.
GET TRANSLATE FILE="auto5.txt" /TYPE=TAB. RENAME VARIABLES (var1 to var4 = make mpg weight price). LIST.
Reading an Excel file is much like reading a tab delimited file, except that you would specify the type=xls subcommand to indicate that you are reading an Excel file. If your file has the names of the variables on the first line, then include the fieldnames subcommand. If your file does not have the names of the variables, the variables will be named var1, var2, var3, etc., and you probably will want to use the rename variables command to assign meaningful names to your variables.
GET TRANSLATE FILE="auto.xls" /TYPE=XLS /FIELDNAMES. LIST.
7. Reading SAS data files into SPSS
SAS data files (.sas7bdat and .sd7) and SAS export files (.xpt) can be read directly into SPSS using the get sas command.
get sas data="d:hsb2.sas7bdat". exe.
8. Reading Stata data files into SPSS
The get stata command was introduced in version 14 of SPSS. It allows data sets created in versions 4-8 of Stata to be read directly into SPSS.
get stata data="d:hsb2.dta". exe.
If you have a Stata data set that was created with a later version of Stata, you may need to use a program such as Stat/Transfer to convert your Stata data file into an SPSS data file, or save the Stata data file as a .csv file, and then read the .csv file into SPSS. This is often not a desirable option, as variable labels and value labels will be lost.
9. Problems to look out for
- If you read fixed column data that have data after the decimal point, be sure to use the formats command to tell SPSS to display the data after the decimal point.
- If you have a string (alphanumeric) variable in the
middle of your data file, SPSS may assume that all the preceding variables are also string
variables. In the example below, SPSS will assume that all of the variables from a
to f are string variables.
DATA LIST LIST
/ a b c d e f (A15) g h i j.
to indicate to SPSS that just variable f is a string variable, use
DATA LIST LIST
/ a b c d e * f (A15) g h i j.
which tells SPSS that to treat a to e as numeric. - In trying to read an Excel spreadsheet, you may need to save the file as an earlier version of Excel to be able to read it into SPSS. If you cannot do this, you may want to use a program like Stat/Transfer to convert the data.
10. For more information
- For more information about reading data in SPSS, see the SPSS Command Syntax Reference.