The code needed to read binary data into R is relatively easy.
However, reading the data in *correctly*
requires that you are either already familiar with your data or possess a
comprehensive description of the data structure.

In the binary data file, information is stored in groups of binary digits. Each binary digit is a zero or one and eight binary digits grouped together is a byte. In order to successfully read binary data, you must know how pieces of information have been parsed into binary. For example, if your data consists of integers, how may bytes should you interpret as representative of one integer in your data? Or if your data contains both positive and negative numbers, how can you distinguish the two? How many pieces of information do you expect to find in the binary data?

Ideally, you know the answers to these questions before starting to read
in the binary file. If you do not, you can explore the read in options
in R. To get started, we establish a connection to a file and indicate that
we will be using the connection to read in binary data. We do this
with the file command, providing first the pathname, and the **“rb” **for
“reading binary”. For more details, see **help(file)** in R.

to.read = file("https://stats.idre.ucla.edu/stat/r/faq/bintest.dat", "rb")

Next, we use the **readBin** command to begin. If we think the file
contains integers, we can start by reading in the first integer and hoping
that the size of the integer does not require further specifications.
Different platforms store binary data in different ways, and which end of a
string of binary values represents the greatest values or smallest values is
a difference that can yield very different results from the same set of
binary values. This characteristic is called the “endian”. The binary
files in the examples on this page were written using a PC, which suggests
they are little-endian. When reading in binary data that may or may
not have been written on a different platform, indicating an endian can be
crucial. For example, without adding **endian = “little”** to the
command below while running R on a Mac, the command reads the first integer
as 16777216.

readBin(to.read, integer(), endian = "little")[1] 1

Thus, it looks like the first integer in the file is 1. As we repeatedly
use **readBin** commands, we will work our way through the binary file
until we hit the end. We can read in
multiple integers at once by adding an **n=** option to our command. If
the n you specify is greater than the number of integers you specified, **
readBin** will read and display as much as is available, so there is no
danger of guessing too large an **n**. Since we have already read in the
first integer, this command will begin at the second.

readBin(to.read, integer(), n = 4, endian = "little")[1] 2 3 4 5

If you know have additional information about what is in your file, you
should incorporate that into the readBin command. For example, if
you know that you wish to read in integers stored on 4 bytes each, you can
indicate this with the **size** option:

readBin(to.read, integer(), n = 2, size = 4, endian = "little")[1] 6 7

Similarly, if you know that your file contains characters, complex
numbers, or some other type of information, you would adjust the **readBin**
command accordingly, changing **integer()** to **character()** or **
complex()**. See **help(readBin)** in R for more details.

Since you will likely want to do more than just look at what is contained in the binary file, you will need some strategies for formatting data as you read it in. For example, suppose you are given a binary file with the following description: three numeric variables collected from 200 subjects, the three variable names appear first in the file, the numeric values are integers store on two bytes each, and all of the values for the first variables are followed by all the values for the second and then all of the values for the third (as if they have be read in as columns, not rows). First, open a connection to the data.

newdata = file("https://stats.idre.ucla.edu/stat/r/faq/bindata.dat", "rb")

Next, let’s read in the variable names and save them to a vector in R.

varnames = readBin(newdata, character(), n=3) varnames[1] "read" "write" "math"

To read in the integer values, we can opt to read all 300 onto one vector, and then separate it out into the three variables.

datavals = readBin(newdata, integer(), size = 4, n = 600, endian = "little") readvals = datavals[1:200] writevals = datavals[201:400] mathvals = datavals[401:600]

Or we can read in each variable’s values with a separate readBin command.

readvals = readBin(newdata, integer(), size = 4, n = 200, endian = "little") writevals = readBin(newdata, integer(), size = 4, n = 200, endian = "little") mathvals = readBin(newdata, integer(), size = 4, n = 200, endian = "little")

Then, we can combine our three value vectors into one data frame with the variable names as our column names.

rdata = cbind(readvals, writevals, mathvals) colnames(rdata) = varnames rdata[1:5,]read write math [1,] 57 52 41 [2,] 68 59 53 [3,] 44 33 54 [4,] 63 44 47 [5,] 47 52 57

Lastly, since we have finished reading data from the binary file, we can close the connection.

close(newdata)

If you wish to write a binary file from R, see R FAQ: How can I write a binary data file in R?