1. Introduction
When you have two data files, you may want to combine them by stacking them one on top of the other (referred to as concatenating files). Below we have a file called dads and a file containing moms.
dads famid name inc 2 Art 22000 1 Bill 30000 3 Paul 25000
moms famid name inc 1 Bess 15000 3 Pat 50000 2 Amy 18000
Below we have stacked (concatenated) these files creating a file we called momdad. These examples will show how to concatenate files in SPSS.
momdad famid name inc 2 Art 22000 1 Bill 30000 3 Paul 25000 1 Bess 15000 3 Pat 50000 2 Amy 18000
2. Concatenating the moms and dads
The SPSS program below creates an SPSS data file called dads.sav and then creates a file called moms.sav.
DATA LIST LIST / famid * name (A4) inc. BEGIN DATA. 2 Art 22000 1 Bill 30000 3 Paul 25000 END DATA. SAVE OUTFILE = "dads.sav". LIST. DATA LIST LIST / famid * name (A4) inc. BEGIN DATA. 1 Bess 15000 3 Pat 50000 2 Amy 18000 END DATA. SAVE OUTFILE = "moms.sav". LIST.
Below we see the output of the SPSS commands above showing that the data was read correctly.
FAMID NAME INC 2.00 Art 22000.00 1.00 Bill 30000.00 3.00 Paul 25000.00FAMID NAME INC 1.00 Bess 15000.00 3.00 Pat 50000.00 2.00 Amy 18000.00
We can combine the files using the add files command as shown below.
ADD FILES FILE="dads.sav" /FILE="moms.sav". LIST CASES.
The output of these commands are shown below.
OBS FAMID NAME INC 1 2 Art 22000 2 1 Bill 30000 3 3 Paul 25000 4 1 Bess 15000 5 3 Pat 50000 6 2 Amy 18000
The output from this program shows that the files were combined properly. The dads and moms are stacked together in one file. But, there is a little problem. We can’t tell the dads from the moms. Let’s try doing this again but in such a way that we can tell which observations are the moms and which are the dads.
3. Concatenating the moms and dads, a better example
In order to tell the dads from the moms, let’s ask SPSS to create a variable called dad that will be 1 for observations from dads.sav, and a variable called mom that will be 1 for observations from moms.sav . As you see below, this is accomplished using /in=dad and /in=mom option. We then create a variable called momdad in the dads and moms data files that will contain "dad" for the dads data file and "mom" for the moms data file. When we combine the two files together the momdad variable will tell us who the moms and dads are.
ADD FILES FILE="dads.sav" /IN=dad /FILE="moms.sav" /in=mom STRING momdad (A3). IF dad=1 momdad="dad". IF mom=1 momdad="mom". LIST.
The output of these commands is shown below.
FAMID NAME INC DAD MOM MOMDAD 2.00 Art 22000.00 1 0 dad 1.00 Bill 30000.00 1 0 dad 3.00 Paul 25000.00 1 0 dad 1.00 Bess 15000.00 0 1 mom 3.00 Pat 50000.00 0 1 mom 2.00 Amy 18000.00 0 1 mom
Here we get a more desirable result, because we can tell the dads from the moms by looking at the variable momdad. In this example, we could have skipped the step of creating the momdad variable and just referred either to the mom dummy variable or the dad dummy variable. We wanted to illustrate the strategy of creating momdad if you ever combined three or more files, in which case the dummy variables would not be as useful.
4. Ordering the variables in the new file
You can use the /map subcommand with the add files command to see the order of the variables in the new file, as illustrated below. If you would like to rearrange the order of the variables in the new file, you can also add the /keep subcommand to the add files command. The variables will be ordered in the new file in the order that you list them on the /keep subcommand. If you do not list all of the variables on the /keep subcommand, the variables not listed will not be present in the new file. Also note that you can list the first few variables if they are the only ones that need to be reordered, and then use the keyword all to have the rest of the variables included in the new file. The variables not specified on the /keep subcommand will remain the order in which they are in the original files.
DATA LIST LIST / famid * name (A4) inc. BEGIN DATA. 2 Art 22000 1 Bill 30000 3 Paul 25000 END DATA. SAVE OUTFILE = "dads.sav". DATA LIST LIST / famid * name (A4) inc. BEGIN DATA. 1 Bess 15000 3 Pat 50000 2 Amy 18000 END DATA. SAVE OUTFILE = "moms.sav".ADD FILES FILE="dads.sav" /FILE="moms.sav" /KEEP = name ALL /MAP. EXECUTE.Map of the result file Result Input1 Input2------ ------ ------NAME NAME NAMEFAMID FAMID FAMIDINC INC INC
As you can see, the variables in the new file are now in the order name, famid and inc.
5. Problems to look out for
These above examples cover situations where there are no complications. However, look out for these complications.
5.1 The two data files have different variable names for the same thing
For example, income is called dadinc and in the dads file and called mominc in the moms file, as shown below.
DATA LIST LIST / famid * name (A4) dadinc . BEGIN DATA. 2 Art 22000 1 Bill 30000 3 Paul 25000 END DATA. SAVE OUTFILE = "dads.sav". LIST. DATA LIST LIST / famid * name (A4) mominc . BEGIN DATA. 1 Bess 15000 3 Pat 50000 2 Amy 18000 END DATA. SAVE OUTFILE = "moms.sav". LIST. ADD FILES FILE="dads.sav" /IN=dad /FILE="moms.sav" /in=mom STRING momdad (A3). IF dad=1 momdad="dad". IF mom=1 momdad="mom". LIST.
You can see the problem illustrated below.
FAMID NAME DADINC MOMINC DAD MOM MOMDAD 2.00 Art 22000.00 . 1 0 dad 1.00 Bill 30000.00 . 1 0 dad 3.00 Paul 25000.00 . 1 0 dad 1.00 Bess . 15000.00 0 1 mom 3.00 Pat . 50000.00 0 1 mom 2.00 Amy . 18000.00 0 1 mom
Solution #1. The most obvious solution is to choose appropriate variable names for the original files (i.e., name the variable inc in both the moms and dads file). This solution is not always possible since you might be concatenating files that you did not originally create. To save space, we omit illustrating this solution.
Solution #2. If solution #1 is not possible, we can fix this problem after we combine the files as shown below. We use the momdad variable to assign dadinc to inc for the dads, and mominc to inc for the moms.
IF momdad = "dad" inc = dadinc. IF momdad = "mom" inc = mominc. LIST.
The results are shown below, where inc now has the income for both the moms and dads.
FAMID NAME DADINC MOMINC DAD MOM MOMDAD INC 2.00 Art 22000.00 . 1 0 dad 22000.00 1.00 Bill 30000.00 . 1 0 dad 30000.00 3.00 Paul 25000.00 . 1 0 dad 25000.00 1.00 Bess . 15000.00 0 1 mom 15000.00 3.00 Pat . 50000.00 0 1 mom 50000.00 2.00 Amy . 18000.00 0 1 mom 18000.00
Solution 3. Another way you can fix this is by using the rename subcommand on the add files command to rename the variables just before the files are combined.
ADD FILES FILE="dads.sav" /RENAME=(dadinc=inc) /FILE="moms.sav" /RENAME=(mominc=inc). LIST.
You can see the results below. This solution is more elegant than renaming the variables after the fact.
FAMID NAME INC 2.00 Art 22000.00 1.00 Bill 30000.00 3.00 Paul 25000.00 1.00 Bess 15000.00 3.00 Pat 50000.00 2.00 Amy 18000.00
5.2 The two data files have variables with the same name but different types
In the dads data set below, famid is a numeric variable. However, in the moms data set, it is a string variable. If we try to concatenate these files, we will get the error message shown below.
DATA LIST LIST / famid * name (A4) inc. BEGIN DATA. 2 Art 22000 1 Bob 30000 3 Pat 25000 END DATA. SAVE OUTFILE = "dads.sav". DATA LIST LIST / famid (A1) name (A4) inc. BEGIN DATA. 1 Bess 15000 3 Pat 50000 2 Amy 18000 END DATA. SAVE OUTFILE = "moms.sav". ADD FILES FILE="dads.sav" /FILE="moms.sav".
Variable(s) with conflicting type:
Result Type Type
FAMID FAMID NUM FAMID S1
Codes: NUM = Numeric, SN = String of length N.
>Error # 5127
>Mismatched variable types on the input files.
>This command not executed.
The solution is to change one of the variables to be the same type as the other. In most cases, you will want to change the string variable to be numeric, as shown below.
autorecode famid /into famid1.
5.3 The two data files have string variables with the same name but different lengths
In all of the examples above, the variable name was input with the format A4 indicating name is an alphabetic (string) variable with a length of 4. What would happen if name in the dads file was A3 and name in the moms file was A4. This is illustrated below.
DATA LIST LIST / famid * name (A3) inc. BEGIN DATA. 2 Art 22000 1 Bob 30000 3 Pat 25000 END DATA. SAVE OUTFILE = "dads.sav". DATA LIST LIST / famid * name (A4) inc. BEGIN DATA. 1 Bess 15000 3 Pat 50000 2 Amy 18000 END DATA. SAVE OUTFILE = "moms.sav". ADD FILES FILE="dads.sav" /FILE="moms.sav".
When we combine these files, SPSS gives us the error message shown below.
Variable(s) with conflicting type: Result Type Type NAME NAME S3 NAME S4 Codes: NUM = Numeric, SN = String of length N. Error # 5127 Mismatched variable types on the input files. No further commands will be executed. Error scan continues.
As you can see, SPSS considered this a serious error and did not merge the files.
Solution #1. Define the variables to have the same length in the original files (i.e., use the A4 format for both the moms and dads file). This is the simplest solution if you are creating the files yourself. We will omit illustrating this solution to save space.
Solution #2. You may not have created the original raw data files, so solution #1 may not be possible for you. In that case, you can create a new variable in each file that has the same length and will be compatible when you merge the files. Below we illustrate this strategy.
For the dads file, a new variable called name2 is created with a length of A4 the value of name is copied to name2. Finally, we save the file as dads2 and drop the variable name. The same is done for the moms file. Then, the files dads2 and moms2 can be combined now that name2 is compatible between the two files.
GET FILE="dads.sav". STRING name2 (A4). COMPUTE name2=name. SAVE OUTFILE="dads2.sav" /DROP name. GET FILE="moms.sav". STRING name2 (A4). COMPUTE name2=name. SAVE OUTFILE="moms2.sav" /DROP name. ADD FILES FILE="dads2.sav" /FILE="moms2.sav". LIST.
The results are shown below.
FAMID INC NAME2 2.00 22000.00 Art 1.00 30000.00 Bob 3.00 25000.00 Pat 1.00 15000.00 Bess 3.00 50000.00 Pat 2.00 18000.00 Amy
Note that if the variable is a numeric variable, then SPSS will still concatenate the file even if the lengths of the two numeric variables is not the same. The length of the variable in the first file listed in the command will be the length used in the concatenated file.
5.4 The two data files have variables with the same name but different codes
This problem is similar to the problem above, except that it has an additional wrinkle, illustrated below. In the dads file there is a variable called fulltime that is coded 1 if the dad is working full time, 0 if he is not. The moms file also has a variable called fulltime that is coded Y is she is working full time, ad N if she is not. Not only are these variables of different types (character and numeric), but they are coded differently as well.
DATA LIST LIST / famid * name (A4) inc * fulltime * . BEGIN DATA. 2 Art 22000 0 1 Bob 30000 1 3 Pat 25000 1 END DATA. SAVE OUTFILE = "dads.sav". LIST. DATA LIST LIST / famid * name (A4) inc * fulltime (A1). BEGIN DATA. 1 Bess 15000 N 3 Pat 50000 Y 2 Amy 18000 N END DATA.
We forego trying to combine these files since we already know that this will not work (based on the prior example). How can we solve this?
Solution #1. Code the variables in the two files in the same way. For example, code fulltime using 0/1 for both files with 1 indicating working fulltime. This is the simplest solution if you are creating the files yourself. We will omit illustrating this solution to save space.
Solution #2. You may not have created the original raw data files, so solution #1 may not be possible for you. In that case, you can create a new variable in each file that has the same coding and will be compatible when you merge the files. Below we illustrate this strategy.
For the dads file, we make a variable called full that is the same as fulltime, and save the file as dads2, dropping fulltime. For the moms, we create full by recoding fulltime, and save the file as moms2, also dropping fulltime. The files dads2 and moms2 both have the variable full coded the same way (0/1 where 1=works full time) so we can combine those files together.
GET FILE="dads.sav". COMPUTE full=fulltime. SAVE OUTFILE="dads2.sav" /DROP fulltime. GET FILE="moms.sav". IF fulltime="Y" full=1. IF fulltime="N" full=0. SAVE OUTFILE="moms2.sav" /DROP fulltime. ADD FILES FILE="dads2.sav" /FILE="moms2.sav" LIST.
The results are shown below.
FAMID NAME INC FULL 2.00 Art 22000.00 .00 1.00 Bob 30000.00 1.00 3.00 Pat 25000.00 1.00 1.00 Bess 15000.00 .00 3.00 Pat 50000.00 1.00 2.00 Amy 18000.00 .00
5.5 You have run the add files command, and nothing happened
If you run just the add files command, as shown below, SPSS will not do anything. However, you will see a note in the lower right corner of the data editor saying "transformation pending".
ADD FILES FILE="dads.sav" /FILE="moms.sav"
Solution: The solution is to add either the execute command or a procedure command that will force the execution of the transformation, such as the list command or the crosstab command.
ADD FILES FILE="dads.sav" /FILE="moms.sav" execute.
6. For more information
- For more information about concatenating data files, see the add files command in the SPSS Syntax Reference Guide.
-
For information on match merging data files, see the SPSS Learning Module on Match Merging SPSS Data Files.