devtools::load_all()

The geotrellis R package was an experiment to see if R could leverage the Scala data processing engine geotrellis to process spatial data faster than the raster R package. This vignette outlines the functions implemented in the geotrellis R package, and provides some benchmark results against the raster R package. This R package is experimental and not recommended for real analysis. Current benchmarks indicate that little to no benefits will be gained using this package.

Spatial data is stored in a gt_RasterLayer object. This R6 class contains all low-level the definitions for reading, writing, processing, and analyzing data. However, users should not interact with this class' methods directly. Instead, users should use the S4 methods provided. The usage of these S4 methods is outlined below.

Reading/writing data

Spatial data is read using the gt_raster method. This method can load data using a character file path for a GeoTiff file. Additionally, users can coerce a raster::RasterLayer-class object. Note that the package only support single-band layer raster data.

# load in data using file path
g <- gt_raster(system.file('extdata/test.tif', package='geotrellis'))

# load in data using a RasterLayer object
g <- gt_raster(raster(system.file('extdata/test.tif', package='geotrellis')))
````

Once data has been loaded into _R_, users can access the metadata associated with the data. The methods used to access the metadata follow the conventions used in the _raster R_ package.

```r
print(g)  # print summary of object
crs(g)    # coordinate reference system
extent(g) # spatial extent of data
res(g)    # resolution
ncell(g)  # number of data cells
ncol(g)   # number of columns
nrow(g)   # number of rows

Users can also convert the data into native R objects and raster::RasterLayer-class objects.

values(g)    # convert data as a numeric vector
as.matrix(g) # convert data to a matrix
as.raster(g) # convert data to a RasterLayer

Finally, data can be saved to disk using the gt_writeRaster function.

# save data to disk in a GeoTiff format
gt_writeRaster(g, tempfile(fileext='.tif')) 

Geoprocessing

The geotrellis R package contains several geoprocessing methods.

# create new layer with different spatial resolution
l <- gt_raster(raster::disaggregate(as.raster(g), 2, method=''))

# resample data to new resolution
gt_resample(g, l, method='ngb')
gt_projectRaster(g, sp::CRS('+init=epsg:3395'), res=50000, method='ngb')
gt_crop(g, extent(raster::extent(c(0, 3, 2, 6))))
# create mask layer
m <- as.raster(g)
m <- raster::setValues(m, sample(c(1,NA), ncell(m), replace=TRUE))
m <- gt_raster(m)

# run mask analysis
gt_mask(g, m)

Statistics

This package offers two statistical methods for analyzing data in a gt_RasterLayer object. Note that there is currently a bug in the geotrellis code that causes medians to be calculated incorrectly. Additionally, note that unlike the functions in the raster R package, these functions calculate the population standard deviation, not the sample standard deviation.

The gt_cellStats method can be used to calculate statistics using all the cells in the raster data.

gt_cellStats(g)

The the gt_zonal method can be used to calculate statistics in zones, as specified in a second raster.

# make layer with zones
z <- as.raster(g)
z <- raster::setValues(z, sample(1:2, ncell(z), replace=TRUE))
z <- gt_raster(z)

# run zonal statistics
gt_zonal(g, z)

Benchmarks

Finally, the geotrellis R package provides a benchmark function to automate the benchmarking of all these functions and compare them to their counterparts in the raster package.

# run benchmark
b <- benchmark(ncell=c(1e+1, 1e+2, 1e+3, 1e+4, 1e+5), times=10L)
# print benchmark
print(b)
# plot benchmark
plot(b)

Acknowledgments

I would like to thank to the geotrellis developers for their assistance.



jeffreyhanson/geotrellis documentation built on May 19, 2019, 4 a.m.