How to Input data into R

Importing formatted data files using the functions in the foreign package

The foreign package contains functions that will allow you to import data files from some of the most commonly used statistical software packages such as SAS, Stata and SPSS. To download the foreign package from the CRAN website from within R, click on “Packages” and then “Install package(s) from CRAN”. You will then need to load the package, and you can use the help function.

library(foreign)
help(package=foreign)

The package contains the following functions:

data.restore   Read an S3 Binary File
lookup.xport   Lookup Information on a SAS XPORT Format
               Library
read.dbf       Read a DBF File
read.dta       Read Stata binary files
read.epiinfo   Read Epi Info data files
read.mtp       Read a Minitab Portable Worksheet
read.octave    Read Octave Text Data Files
read.spss      Read an SPSS data file
read.ssd       Obtain a Data Frame from a SAS Permanent
               Dataset, via read.xport
read.systat    Obtain a Data Frame from a Systat File
read.xport     Read a SAS XPORT Format Library
write.dbf      Write a DBF File
write.dta      Write Files in Stata Binary Format
write.foreign  Write text files and code to read them

To download the package:
install.packages("foreign")

To view the functions in the package:
library(help=foreign)

To view the help file for a specific function, for example the function read.dta:
?read.dta

Note that this is an outdated package and will not work for Stata 12 or higher data files. There is a package called “readstata13” available to read Stata 13 files.

Here are examples of importing a Stata data file called test.dta.

test.stata <- read.dta("https://stats.idre.ucla.edu/stat/data/test.dta")
print(test.stata)

   make   model mpg weight price
1   AMC Concord  22   2930  4099
2   AMC   Pacer  17   3350  4749
3   AMC  Spirit  22   2640  3799
4 Buick Century  20   3250  4816
5 Buick Electra  15   4080  7827

Importing free formatted (delimited) data files using the read.table function

The read.table function is very useful when reading in ASCII files that contain rectangular data. When the file contains the variable names in the first line of data the option header should be set to TRUE. The default delimiter is blank space, other delimiters must be specified by using the sep option and setting it equal to the delimiter in quotes (i.e., sep=”;” for the semicolon delimited data file).

Here are some examples of data with different types of delimiters. We will start by looking at a typical bread and butter type of data file namely a space delimited ASCII file called test.txt.

test.txt <- read.table("https://stats.idre.ucla.edu/wp-content/uploads/2016/02/test.txt", header=T)
print(test.txt)

   make   model mpg weight price 
1   AMC Concord  22   2930  4099
2   AMC   Pacer  17   3350  4749
3   AMC  Spirit  22   2640  3799
4 Buick Century  20   3250  4816
5 Buick Electra  15   4080  7827

Another very common type of file is the comma delimited file. The file test-1.csv has been saved out of Excel as a comma delimited file. This file can be read in by the read.table function by using the sep option, but it can also be read in by the read.csv function which was written specifically for comma delimited files.

test.csv <- read.csv("https://stats.idre.ucla.edu/wp-content/uploads/2016/02/test-1.csv", header=T)
print(test.csv)

   make   model mpg weight price 
1   AMC Concord  22   2930  4099
2   AMC   Pacer  17   3350  4749
3   AMC  Spirit  22   2640  3799
4 Buick Century  20   3250  4816
5 Buick Electra  15   4080  7827

test.csv1 <- read.table("https://stats.idre.ucla.edu/wp-content/uploads/2016/02/test-1.csv", header=T, sep=",")
print(test.csv1)

   make   model mpg weight price 
1   AMC Concord  22   2930  4099
2   AMC   Pacer  17   3350  4749
3   AMC  Spirit  22   2640  3799
4 Buick Century  20   3250  4816
5 Buick Electra  15   4080  7827

It is, of course, also possible to use the read.table function for reading in files with other delimiters. In the data called testsemicolon.txt has semicolon delimiters and the dataset test called testz.txt uses the letter z as a delimiter, both of which are acceptable delimiters in R.

test.semi <- read.table("https://stats.idre.ucla.edu/wp-content/uploads/2016/02/testsemicolon.txt", header=T, sep=";")
print(test.semi)

   make   model mpg weight price 
1   AMC Concord  22   2930  4099
2   AMC   Pacer  17   3350  4749
3   AMC  Spirit  22   2640  3799
4 Buick Century  20   3250  4816
5 Buick Electra  15   4080  7827

test.z <- read.table("https://stats.idre.ucla.edu/wp-content/uploads/2016/02/testz.txt", header=T, sep="z")
print(test.z)

   make   model mpg weight price 
1   AMC Concord  22   2930  4099
2   AMC   Pacer  17   3350  4749
3   AMC  Spirit  22   2640  3799
4 Buick Century  20   3250  4816
5 Buick Electra  15   4080  7827

Importing data files using the scan function

The scan function is an extremely flexible tool for importing data. It can be used to read in almost any type of data, numeric, character or complex and it can be used for fixed or free formatted files. Moreover, by using the scan function it is possible to input data directly from the console. The scan function reads the fields of data in the file as specified by the what option with the default being numeric. If the what option is specified to be what=character() or what=” ” then all the fields will be read as strings. If the data is a mix of numeric, string or complex data then a list can be used in the what option. The default separator for the scan function is any white space (single space, tab, or new line). However, unlike the read.table function which returns a data frame, the scan function returns a list or a vector. This makes the scan function less useful for inputting “rectangular” data such as the car data set that was seen in the previous examples.

In the following examples we input first numeric data and then string data directly from the console; then we input the text file, scan.txt, where the first variable is a string variable and the second variable is numeric.

#inputting data directly from the console
x <- scan()
1: 3 5 6 9
5: 2 5 6 
8: 
Read 7 items
x
[1] 3 5 6 9 2 5 6

# inputting string data directly from the console
name.x <- scan(, what="")
1: bobby
2: kate dave
4: mia
5: 
Read 4 items
name.x
[1] "bobby" "kate"  "dave"  "mia" 

# inputting a text file and outputting a list
x <- scan("https://stats.idre.ucla.edu/wp-content/uploads/2016/02/scan.txt", what=list(age=0, name=""))
Read 4 records

x
$age
[1] 12 24 35 20

$name
[1] "bobby"   "kate"    "david"   "michael"

# using the same text file and saving only the names as a vector
x <- scan("https://stats.idre.ucla.edu/wp-content/uploads/2016/02/scan.txt", what=list(NULL, name=character()))
Read 4 records

x <- x[sapply(x, length) > 0] 
x
$name
[1] "bobby"   "kate"    "david"   "michael"

is.vector(x)
[1] TRUE

Importing Fixed Format Files Using the read.fwf Function

For fixed format files the variables names are often in a separate file from the data. In this example the variable names are in a file called names.txt and the data are in a file called testfixed.txt. This is especially convenient when the fixed format file is very large and has many variables; then it becomes rather impractical to type in all the variable names. In this situation the width option is used to specify the width of each variable and the col.name option specifies the file containing the variable names. So, first we read in the file for the names using the scan function. We specify that file contains character values by setting the what option to equal character(). By using the col.names option in the read.fwf function names will supply the variables names.

names <- scan("https://stats.idre.ucla.edu/wp-content/uploads/2016/02/names.txt", what=character() )
Read 5 items
print(names)

[1] "model"  "make"   "mph"    "weight" "price" 

test.fixed <- read.fwf("https://stats.idre.ucla.edu/wp-content/uploads/2016/02/testfixed.txt", col.names=names, width = c(5, 7, 2, 4, 4))
print(test.fixed)

  model    make mph weight price 
1   AMC Concord  22   2930  4099
2   AMC   Pacer  17   3350  4749
3   AMC  Spirit  22   2640  3799
4 Buick Century  20   3250  4816
5 Buick Electra  15   4080  7827

Note that if there is a warning about an incomplete final line, it means you should add a carriage return (press Enter) after the last line so that the function will know when to stop reading the file.

Exporting files using the write.table function

The write.table function outputs data files. The first argument specifies which data frame in R is to be exported. The next argument specifies the file to be created. The default separator is a blank space but any separator can be specified in the sep option. The default value for both the row.names and col.names options is TRUE. In the example we specify that we do not wish to include row names. The default setting for the quote option is to include quotes around all the character values, i.e. around values in string variables and around the column names. As we have shown in the example it is very common not to want the quotes when creating a text file.

# using the test.csv data frame to write a text file with no row names 
# and without quotes around the character values (both column names and string variables)

write.table(test.csv, "c:/temp/test1.txt", row.names=F, quote=F)