1.0 Stata commands in this unit
cd | Change directory |
dir or ls | Show files in current directory |
insheet | Read ASCII (text) data created by a spreadsheet |
infile | Read unformatted ASCII (text) data |
infix | Read ASCII (text) data in fixed format |
input | Enter data from keyboard |
describe | Describe contents of data in memory or on disk |
compress | Compress data in memory |
save | Store the dataset currently in memory on disk in Stata data format |
use | Load a Stata-format dataset |
count | Show the number of observations |
list | List values of variables |
clear | Clear the entire dataset and everything else |
memory | Display a report on memory usage |
set memory | Set the size of memory |
2.0 Demonstration and explanation
We will start with inputting a spreadsheet type of data file into Stata. A spreadsheet type of file is created by programs such as Excel. For example, in Excel, we can save a file into a comma-separated-values format (.csv) file. Stata reads in this type of data using the insheet command. Let’s first get to the directory where the file https://stats.idre.ucla.edu/wp-content/uploads/2016/02/hs0.csv is. This data file has variable names on the first line.
Here is a partial listing from the comma-separated file:
gender,id,race,ses,schtyp,prgtype,read,write,math,science,socst 0,70,4,1,1,general,57,52,41,47,57 1,121,4,2,1,vocati,68,59,53,63,61 0,86,4,3,1,general,44,33,54,58,31 0,141,4,3,1,vocati,63,44,47,53,56 0,172,4,2,1,academic,47,52,57,53,61 0,113,4,2,1,academic,44,52,51,63,61 0,50,3,2,1,general,50,59,42,53,61 0,11,1,2,1,academic,34,46,45,39,36 0,84,4,2,1,general,63,57,54,,51 0,48,3,2,1,academic,57,55,52,50,51And here are the Stata commands to read these data.
cd d:stata_data dir insheet using https://stats.idre.ucla.edu/wp-content/uploads/2016/02/hs0.csv, clear describe
What if the data file does not have the variable names on the first line? We have a such file called https://stats.idre.ucla.edu/wp-content/uploads/2016/02/hs0_noname.csv. We will also do a count to see if the inputting was successful.
insheet gender id race ses schtyp prgtype read write math science socst using https://stats.idre.ucla.edu/wp-content/uploads/2016/02/hs0_noname.csv, clear count
To read a space-delimited file we will use infile command. The first part of the file hs0.raw is shown below.
0 70 4 1 1 general 57 52 41 47 57 1 121 4 2 1 vocati 68 59 53 63 61 0 86 4 3 1 general 44 33 54 58 31 0 141 4 3 1 vocati 63 44 47 53 56 0 172 4 2 1 academic 47 52 57 53 61 0 113 4 2 1 academic 44 52 51 63 61 0 50 3 2 1 general 50 59 42 53 61 0 11 1 2 1 academic 34 46 45 39 36 0 84 4 2 1 general 63 57 54 . 51 0 48 3 2 1 academic 57 55 52 50 51 0 75 4 2 1 vocati 60 46 51 53 61 0 60 5 2 1 academic 57 65 51 63 61Notice how we specify a character variable below. The variable prgtype is a character variable. We tell Stata this and that we want it to have a length of 10 by typing str10 before the variable name. We will use the https://stats.idre.ucla.edu/wp-content/uploads/2016/02/hs0.raw data file.
infile gender id race ses schtyp str10 prgtype read write math science socst using https://stats.idre.ucla.edu/wp-content/uploads/2016/02/hs0.raw, clear
The other type of commonly used ASCII data format is fixed format. It always requires a codebook to specify which column(s) corresponds to which variable. Here is small example of this type of data with a codebook. Notice how we make use of the codebook in the infix command below. We will use the /stata/notes/schdat.fix data file.
195 094951 26386161941 38780081841 479700 870 56878163690 66487182960 786 069 0 88194193921 98979090781 107868180801
variable name column number id 1-2 a1 3-4 t1 5-6 gender 7 a2 8-9 t2 10-11 tgender 12
clear infix id 1-2 a1 3-4 t1 5-6 gender 7 a2 8-9 t2 10-11 tgender 12 using /stata/notes/schdat.fix
We can also use the Do-file editor to input data. The Do-file editor is used for writing a sequence of commands and running them all at once. You can copy and paste the following Stata syntax to the Do-file editor and run it.
clear input id female race ses str3 schtype prog read write math science socst 147 1 1 3 pub 1 47 62 53 53 61 108 0 1 2 pub 2 34 33 41 36 36 18 0 3 2 pub 3 50 33 49 44 36 153 0 1 2 pub 3 39 31 40 39 51 50 0 2 2 pub 2 50 59 42 53 61 51 1 2 1 pub 2 42 36 42 31 39 102 0 1 1 pub 1 52 41 51 53 56 57 1 1 2 pub 1 71 65 72 66 56 160 1 1 2 pub 1 55 65 55 50 61 136 0 1 2 pub 1 65 59 70 63 51 end
After running the above program, we can issue the describe command to get a general idea about the data set. The compress command reduces the size of the data set. We can save the data set to disk by issuing the save command.
describe compress save hsb10
To read in a Stata data file, we use the use command.
clear use hsb10
The use command can also be used to read a data file over the internet.
use https://stats.idre.ucla.edu/stat/data/hs0, clear
Sometimes, the data file may be too big to be read in. We will have to reset the amount of memory allocated to Stata.
clear use https://stats.idre.ucla.edu/stat/data/large memory set memory 5m use https://stats.idre.ucla.edu/stat/data/large, clear
3.0 For more information
- Stata Learning Modules, including…
- Frequently Asked Questions, including…