Binary files offer an efficient and easy-to-recover way to store data. If you wish to convert data in R to binary form, there are a few basics to learn that will allow you to both read and write binary files.
In the binary data file, your information will be stored in groups of binary digits. Each binary digit is a zero or one and eight binary digits grouped together is a byte. In order to successfully read the binary file you write, you must know keep in mind how you are parsing your information into binary. For example, if you have a matrix of data that you are writing to a binary file, are you reading the matrix across the rows or down the columns? If your data consists of integers, how may bytes should represent one integer in your data? On what platform are you working while writing the file?
The binary file you write will be much easier to read if you can answer these questions. This page will provide an example of writing binary data by row. If you wish to write binary data by column, see FAQ: How can I write a binary data file in R by column?. If you are not sure of some of the answers, you can explore the available options in R and consider where you plan to later read in the data to decide which are most appropriate. These are the same options that will be available when reading binary data in R.
Suppose we have a dataset in R, hsb2, and we wish to read a subset of the variables in this dataset to a binary file.
hsb2<-read.table("https://stats.idre.ucla.edu/stat/r/notes/hsb2.csv", sep=",", header=T) hsb2[1:5,] id female race ses schtyp prog read write math science socst 1 70 male white low public general 57 52 41 47 57 2 121 female white middle public vocation 68 59 53 63 61 3 86 male white high public general 44 33 54 58 31 4 141 male white high public vocation 63 44 47 53 56 5 172 male white middle public academic 47 52 57 53 61
To get started, we establish a connection to a file and indicate that we will be using the connection to read in binary data. We do this with the file command, providing first the pathname, and the “wb” for “writing binary”. For more details, see help(file) in R.
to.write = file("C:/binfile.dat", "wb")
If we wish to write a binary file containing the reading, writing, and math scores from the hsb2 dataset, there are several ways in which this can be done. Keep in mind that we are essentially taking a matrix of information and making it into one long list.
For this example, we will list the variable names first, then the first row of data followed by the second and so on through all of the observations in the data frame. This approach to writing data makes sense if you will later be reading it in row-by-row.
To write information to the file we connected to, we will use the writeBin command. The first argument we give writeBin is the integer/string/vector that we wish to write to the binary file. The second argument we give writeBin is the open connection we established. In the command below, we are passing writeBin a vector containing three variable names.
writeBin(colnames(hsb2)[7:9], to.write)
Next, we will loop through the data frame, creating a vector for each row of data and writing out the vector to our binary file.
n
close(to.write)
To verify that you have successfully written the data to a binary file, try to read in the file you just wrote using readBin. For help with this, see R FAQ: How can I read binary data into R?.