Moran’s I is a measure of spatial autocorrelation–how related the values of a variable are based on the locations where they were measured. Using functions in the ape library, we can calculate Moran’s I in R. To download and load this library, enter install.packages(“ape”) and then library(ape).
Let’s look at an example. Our dataset, ozone, contains ozone measurements from thirty-two locations in the Los Angeles area aggregated over one month. The dataset includes the station number (Station), the latitude and longitude of the station (Lat and Lon), and the average of the highest eight hour daily averages (Av8top). This data, and other spatial datasets, can be downloaded from the University of Illinois’ Spatial Analysis Lab. We can look at a summary of our location variables to see the range of locations under consideration.
ozone <- read.table("https://stats.idre.ucla.edu/stat/r/faq/ozone.csv", sep=",", header=T) head(ozone, n=10) Station Av8top Lat Lon 1 60 7.225806 34.13583 -117.9236 2 69 5.899194 34.17611 -118.3153 3 72 4.052885 33.82361 -118.1875 4 74 7.181452 34.19944 -118.5347 5 75 6.076613 34.06694 -117.7514 6 84 3.157258 33.92917 -118.2097 7 85 5.201613 34.01500 -118.0597 8 87 4.717742 34.06722 -118.2264 9 88 6.532258 34.08333 -118.1069 10 89 7.540323 34.38750 -118.5347
To calculate Moran’s I, we will need to generate a matrix of inverse distance weights. In the matrix, entries for pairs of points that are close together are higher than for pairs of points that are far apart. For simplicity, we will treat the latitude and longitude as values on a plane rather than on a sphere–our locations are close together and far from the poles. When using latitude and longitude coordinates from more distant locations, it’s wise to calculate distances based on spherical coordinates (the geosphere package can be used).
We can first generate a distance matrix, then take inverse of the matrix values and replace the diagonal entries with zero:
ozone.dists <- as.matrix(dist(cbind(ozone$Lon, ozone$Lat))) ozone.dists.inv <- 1/ozone.dists diag(ozone.dists.inv) <- 0 ozone.dists.inv[1:5, 1:5]1 2 3 4 5 1 0.000000 2.539795 2.446165 1.627570 5.391160 2 2.539795 0.000000 2.667061 4.531428 1.741071 3 2.446165 2.667061 0.000000 1.954357 2.002389 4 1.627570 4.531428 1.954357 0.000000 1.258716 5 5.391160 1.741071 2.002389 1.258716 0.000000
We have created a matrix where each off-diagonal entry [i, j] in the matrix is equal to 1/(distance between point i and point j). Note that this is just one of several ways in which we can calculate an inverse distance matrix. This is the formulation used by Stata. In SAS, inverse distance matrices have entries equal to 1/(1+ distance between point i and point j) and there are numerous scaling options available.
We can now calculate Moran’s I using the command Moran.I.
Moran.I(ozone$Av8top, ozone.dists.inv) $observed [1] 0.2265501 $expected [1] -0.03225806 $sd [1] 0.03431138 $p.value [1] 4.596323e-14
Based on these results, we can reject the null hypothesis that there is zero spatial autocorrelation present in the variable Av8top at alpha = .05.
Alternatively, one could use a binary distance matrix. This might be used in looking at networking problems where two points are either connected or not connected. In this example, we can choose a distance d such that pairs of points with distance less than d are considered connected and pairs with distance greater than d are not and use this weights matrix to calculate Moran’s I. The code below does this for d = .75.
ozone.dists.bin <- (ozone.dists > 0 & ozone.dists <= .75) Moran.I(ozone$Av8top, ozone.dists.bin) $observed [1] 0.2045738 $expected [1] -0.03225806 $sd [1] 0.04771132 $p.value [1] 6.910876e-07
In this example, we see similar results to those generated using the inverse Euclidean distance matrix and we can reject the null hypothesis that there is zero spatial autocorrelation present in the variable Av8top at alpha = .05.