Binary files offer an efficient and easy-to-recover way to store data. If you wish to convert a data frame in R to binary form, there are a few basics to learn that will allow you to both read and write binary files.
In the binary data file, your information will be stored in groups of binary digits. Each binary digit is a zero or one and eight binary digits grouped together is a byte. In order to successfully read the binary file you write, you must know keep in mind how you are parsing your information into binary. For example, if you have a matrix of data that you are writing to a binary file, are you reading the matrix across the rows or down the columns? If your data consists of integers, how may bytes should represent one integer in your data? On what platform are you working while writing the file?
The binary file you write will be much easier to read if you can answer these questions. This page will provide an example of writing binary data by column. If you wish to write binary data by row, see FAQ: How can I write a binary data file in R by row?. If you are not sure of some of the answers, you can explore the available options in R and consider where you plan to later read in the data to decide which are most appropriate. These are the same options that will be available when reading binary data in R.
Suppose we have a dataset in R, hsb2, and we wish to read a subset of the variables in this dataset to a binary file.
hsb2<-read.table("https://stats.idre.ucla.edu/stat/r/notes/hsb2.csv", sep=",", header=T) hsb2[1:5,] id female race ses schtyp prog read write math science socst 1 70 male white low public general 57 52 41 47 57 2 121 female white middle public vocation 68 59 53 63 61 3 86 male white high public general 44 33 54 58 31 4 141 male white high public vocation 63 44 47 53 56 5 172 male white middle public academic 47 52 57 53 61
To get started, we establish a connection to a file and indicate that we will be using the connection to read in binary data. We do this with the file command, providing first the pathname, and the “wb” for “writing binary”. For more details, see help(file) in R.
to.write = file("C:/binfile.dat", "wb")
If we wish to write a binary file containing the reading, writing, and math scores from the hsb2 dataset, there are several ways in which this can be done. Keep in mind that we are essentially taking a matrix of information and making it into one long list. There are several ways to go about getting the matrix of information into list form. Will we include or omit variable names? If we include them, will we list a single variable name followed by all of the information in the variable? Or list all of the variable names and then list the information going across the matrix column by column?
For this example, we will list the variable names first, then all of the values for the first variable named followed by all of the values for the second variable named, and so on. This is an arbitrary choice, but it’s important to note that the choices you make in writing the binary file define the correct way to read the file.
To write information to the file we connected to, we will use the writeBin command. The first argument we give writeBin is the integer/string/vector that we wish to write to the binary file. The second argument we give writeBin is the open connection we established. In the command below, we are passing writeBin a vector containing three variable names.
writeBin(colnames(hsb2)[7:9], to.write)
We can continue to write to the file. R will concatenate additional information to what we have already written. These three variables all contain integer values.
writeBin(hsb2$read, to.write) writeBin(hsb2$write, to.write) writeBin(hsb2$math, to.write)
We could have equivalently written the three sets of variable values with one writeBin statement where the first argument is a concatenated list of the variable values (c(hsb2$read, hsb2$write, hsb2$math)). Now that we have written all of our desired information to the binary file, we can close the connection.
close(to.write)
To verify that you have successfully written the data to a binary file, try to read in the file you just wrote using readBin. For help with this, see R FAQ: How can I read binary data into R?.