This module will show how to input your data into Stata. This covers inputting data with comma delimited, tab delimited, space delimited, and fixed column data.
Note: all of the sample input files for this page were created by us and are not included with Stata. You can create them yourself to try out this code by copying and pasting the data into a text file.
1. Typing data into the Stata editor
One of the easiest methods for getting data into Stata is using the Stata data editor, which resembles an Excel spreadsheet. It is useful when your data is on paper and needs to be typed in, or if your data is already typed into an Excel spreadsheet. To learn more about the Stata data editor, see the edit module.
2. Comma/tab separated file with variable names on line 1
Two common file formats for raw data are comma separated files and tab separated files. Such files are commonly made from spreadsheet programs like Excel. Consider the comma delimited file shown below.
type auto2.raw
make, mpg, weight, price AMC Concord, 22, 2930, 4099 AMC Pacer, 17, 3350, 4749 AMC Spirit, 22, 2640, 3799 Buick Century, 20, 3250, 4816 Buick Electra, 15,4080, 7827
This file has two characteristics:
– The first line has the names of the variables separated by commas,
– The following lines have the values for the variables, also separated by commas.
This kind of file can be read using the insheet command, as shown below.
insheet using auto2.raw (4 vars, 5 obs)
We can check to see if the data came in right using the list command.
list make mpg weight price 1. AMC Concord 22 2930 4099 2. AMC Pacer 17 3350 4749 3. AMC Spirit 22 2640 3799 4. Buick Century 20 3250 4816 5. Buick Electra 15 4080 7827
Since you will likely have more observations, you can use in to list just a subset of observations. Below, we list observations 1 through 3.
list in 1/3
make mpg weight price 1. AMC Concord 22 2930 4099 2. AMC Pacer 17 3350 4749 3. AMC Spirit 22 2640 3799
Now that the file has been read into Stata, you can save it with the save command (we will skip doing that step).
The exact same insheet command could be used to read a tab delimited file. The insheet command is clever because it can figure out whether you have a comma delimited or tab delimited file, and then read it. (However, insheet could not handle a file that uses a mixture of commas and tabs as delimiters.)
Before starting the next section, let’s clear out the existing data in memory.
clear
3. Comma/tab separated file (no variable names in file)
Consider a file that is identical to the one we examined in the previous section, but it does not have the variable names on line 1
type auto3.raw
AMC Concord, 22, 2930, 4099 AMC Pacer, 17, 3350, 4749 AMC Spirit, 22, 2640, 3799 Buick Century, 20, 3250, 4816 Buick Electra, 15,4080, 7827
This file can be read using the insheet command as shown below.
insheet using auto3.raw
(4 vars, 5 obs)
But where did Stata get the variable names? If Stata does not have names for the variables, it names them v1, v2, v3 etc., as you can see below.
list v1 v2 v3 v4 1. AMC Concord 22 2930 4099 2. AMC Pacer 17 3350 4749 3. AMC Spirit 22 2640 3799 4. Buick Century 20 3250 4816 5. Buick Electra 15 4080 7827
Let’s clear out the data in memory, and then try reading the data again.
clear
Now, let’s try reading the data and tell Stata the names of the variables on the insheet command.
insheet make mpg weight price using auto3.raw
(4 vars, 5 obs)
As the list command shows, Stata used the variable names supplied on the insheet command.
list make mpg weight price 1. AMC Concord 22 2930 4099 2. AMC Pacer 17 3350 4749 3. AMC Spirit 22 2640 3799 4. Buick Century 20 3250 4816 5. Buick Electra 15 4080 7827
The insheet command works equally well on files which use tabs as separators. Stata examines the file and determines whether commas or tabs are being used as separators and reads the file appropriately.
Now that the file has been read into Stata, you can save it with the save command (we will skip doing that step).
Let’s clear out the data in memory before going to the next section.
clear
4. Space separated file
Consider a file where the variables are separated by spaces like the one shown below.
type auto4.raw "AMC Concord" 22 2930 4099 "AMC Pacer" 17 3350 4749 "AMC Spirit" 22 2640 3799 "Buick Century" 20 3250 4816 "Buick Electra" 15 4080 7827
Note that the make of car is contained within quotation marks. This is necessary because the names contain spaces within them. Without the quotes, Stata would think AMC is the make and Concord is the mpg. If the make did not have spaces embedded within them, the quotation marks would not be needed.
This file can be read with the infile command as shown below.
infile str13 make mpg weight price using auto4.raw
(5 observations read)
You may be asking yourself, where did the str13 come from? Since make is a character variable, we need to tell Stata that it is a character variable, and how long it can be. The str13 tells Stata it is a string variable and that it could be up to 13 characters wide.
The list command confirms that the data was read correctly.
list
make mpg weight price 1. AMC Concord 22 2930 4099 2. AMC Pacer 17 3350 4749 3. AMC Spirit 22 2640 3799 4. Buick Century 20 3250 4816 5. Buick Electra 15 4080 7827
Now that the file has been read into Stata, you can save it with the save command (we will skip doing that step).
Let’s clear out the data in memory before moving on to the next section.
clear
5. Fixed format file
Consider a file using fixed column data like the one shown below.
type auto5.raw
AMC Concord 22 2930 4099 AMC Pacer 17 3350 4749 AMC Spirit 22 2640 3799 Buick Century 20 3250 4816 Buick Electra 15 4080 7827
Note that the variables are clearly defined by which column(s) they are located. Also, note that the make of car is not contained within quotation marks. The quotations are not needed because the columns define where the make begins and ends, and the embedded spaces no longer create confusion.
This file can be read with the infix command as shown below.
infix str make 1-13 mpg 15-16 weight 18-21 price 23-26 using auto5.raw
(5 observations read)
Here again we need to tell Stata that make is a string variable by preceding make with str. We did not need to indicate the length since Stata can infer that make can be up to 13 characters wide based on the column locations.
The list command confirms that the data was read correctly.
list
make mpg weight price 1. AMC Concord 22 2930 4099 2. AMC Pacer 17 3350 4749 3. AMC Spirit 22 2640 3799 4. Buick Century 20 3250 4816 5. Buick Electra 15 4080 7827
Now that the file has been read into Stata, you can save it with the save command (we will skip doing that step).
Let’s clear out the data in memory before moving on to the next section.
clear
6. Other methods of getting data into Stata
This does not cover all possible methods of getting raw data into Stata, but does cover many common situations. See the Stata Users Guide for more comprehensive information on reading raw data into Stata.
Another method that should be mentioned is the use of data conversion programs. These programs can convert data from one file format into another file format. For example, they could directly create a Stata file from an Excel Spreadsheet, a Lotus Spreadsheet, an Access database, a Dbase database, a SAS data file, an SPSS system file, etc. Two such examples are Stat Transfer and DBMS Copy. Both of these products are available on SSC PCs and DBMS Copy is available on Nicco and Aristotle.
Finally, if you are using Nicco, Aristotle or the RS/6000 Cluster, there is a command specifically for converting SAS data into Stata called sas2stata. If you have SAS data you want to convert to Stata, this may be a useful way to get your SAS data into Stata.
7. Summary
Bring up the Stata data editor for typing data in.
. edit
Read in the comma or tab delimited file called auto2.raw taking the variable names from the first line of data.
. insheet using auto2.raw, clear
Read in the comma or tab delimited file called auto3.raw naming the variables mpg weight and price.
. insheet make mpg weight price using auto3.raw, clear
Read in the space separated file named auto4.raw. The variable make is surrounded by quotes because it has embedded blanks.
. infile str13 make mpg weight price using auto4.raw, clear
Read in the fixed format file named auto5.raw.
. infix str make 1-13 mpg 15-16 weight 18-21 using auto5.raw, clear
Other methods
DBMS/Copy, Stat Transfer, sas2stata, and Stata Users Guide.