For a better version of the stars vignettes see https://r-spatial.github.io/stars/articles/
knitr::opts_chunk$set(echo = TRUE, collapse = TRUE, dev = "png") ev = suppressWarnings(require(starsdata, quietly = TRUE)) knitr::opts_chunk$set(fig.height = 4.5) knitr::opts_chunk$set(fig.width = 6)
Package stars
provides infrastructure for data cubes, array
data with labeled dimensions, with emphasis on arrays where some
of the dimensions relate to time and/or space.
Spatial data cubes are arrays with one or more spatial dimensions.
Raster data cubes have at least two spatial dimensions that form the
raster tesselation. Vector data cubes have at least one spatial
dimension that may for instance reflect a polygon tesselation, or
a set of point locations. Conversions between the two (rasterization,
polygonization) are provided. Vector data are represented by simple
feature geometries (packages sf
). Tidyverse methods are provided.
The stars
package is loaded by
library(stars)
Spatiotemporal arrays are stored in objects of class stars
;
methods for class stars
currently available are
methods(class = "stars")
(tidyverse methods are only visible after loading package tidyverse
).
We can read a satellite image through GDAL, e.g. from a GeoTIFF file in the package:
tif = system.file("tif/L7_ETMs.tif", package = "stars") x = read_stars(tif) plot(x, axes = TRUE)
We see that the image is geographically referenced (has coordinate values along axes), and that the object returned (x
) has three dimensions called x
, y
and band
, and has one attribute:
x
Each dimension has a name; the meaning of the fields of a single dimension are:
|field |meaning | |--------|------------------------------------------------------------| | from | the origin index (1) | | to | the final index (dim(x)[i]) | | offset | the start value for this dimension (pixel boundary), if regular | | delta | the step (pixel, cell) size for this dimension, if regular | | refsys | the reference system, or proj4string | | point | logical; whether cells refer to points, or intervals | | values | the sequence of values for this dimension (e.g., geometries), if irregular |
This means that for an index i (starting at $i=1$) along a certain dimension, the corresponding dimension value (coordinate, time) is $\mbox{offset} + (i-1) \times \mbox{delta}$. This value then refers to the start (edge) of the cell or interval; in order to get the interval middle or cell centre, one needs to add half an offset.
Dimension band
is a simple sequence from 1 to 6. Since bands refer to colors, one could put their wavelength values in the values
field.
For this particular dataset (and most other raster datasets), we see that delta for dimension y
is negative: this means that consecutive array values have decreasing $y$ values: cell indexes increase from top to bottom, in the direction opposite to the $y$ axis.
read_stars
reads all bands from a raster dataset, or optionally a subset of raster datasets, into a single stars
array structure. While doing so, raster values (often UINT8 or UINT16) are converted to double (numeric) values, and scaled back to their original values if needed if the file encodes the scaling parameters.
The data structure stars
is a generalization of the tbl_cube
found in cubelyr
; we can convert to that by
library(cubelyr) as.tbl_cube(x)
but this will cause a loss of certain properties (cell size, reference system, vector geometries)
(x.spl = split(x, "band")) merge(x.spl)
We see that the newly created dimension lost its name, and the single attribute got a default name. We can set attribute names with setNames
, and dimension names and values with st_set_dimensions
:
merge(x.spl) |> setNames(names(x)) |> st_set_dimensions(3, values = paste0("band", 1:6)) |> st_set_dimensions(names = c("x", "y", "band"))
Besides the tidyverse
subsetting and selection operators explained
in this vignette, we can also use [
and [[
.
Since stars
objects are a list of array
s with a metadata table
describing dimensions, list extraction (and assignment) works as expected:
class(x[[1]]) dim(x[[1]]) x$two = 2 * x[[1]] x
At this level, we can work with array
objects directly.
The stars
subset operator [
works a bit different: its
Thus,
x["two", 1:10, , 2:4]
selects the second attribute, the first 10 columns (x-coordinate), all rows, and bands 2-4.
Alternatively, when [
is given a single argument of class sf
,
sfc
or bbox
, [
will work as a crop operator:
circle = st_sfc(st_buffer(st_point(c(293749.5, 9115745)), 400), crs = st_crs(x)) plot(x[circle][, , , 1], reset = FALSE) plot(circle, col = NA, border = 'red', add = TRUE, lwd = 2)
We can read rasters at a lower resolution when they contain so-called overviews. For this
GeoTIFF file, they were created with the gdaladdo
utility, in particular
gdaladdo -r average L7_ETMs.tif 2 4 8 16
which adds coarse resolution versions by using the average resampling method to compute values based on blocks of pixels. These can be read by
x1 = read_stars(tif, options = c("OVERVIEW_LEVEL=1")) x2 = read_stars(tif, options = c("OVERVIEW_LEVEL=2")) x3 = read_stars(tif, options = c("OVERVIEW_LEVEL=3")) dim(x1) dim(x2) dim(x3) par(mfrow = c(1, 3), mar = rep(0.2, 4)) image(x1[,,,1]) image(x2[,,,1]) image(x3[,,,1])
Another example is when we read raster time series model outputs in a NetCDF file, e.g. by
system.file("nc/bcsd_obs_1999.nc", package = "stars") |> read_stars() -> w
We see that
w
For this dataset we can see that
C
is assigned to temperature)Alternatively, this dataset can be read using read_ncdf
, as in
system.file("nc/bcsd_obs_1999.nc", package = "stars") |> read_ncdf()
The difference between read_ncdf
and read_stars
for NetCDF files
is that the former uses package RNetCDF to directly read the NetCDF
file, where the latter uses the GDAL driver for NetCDF files.
Model data are often spread across many files. An example of a 0.25 degree grid, global daily sea surface temperature product is found here; the subset from 1981 used below was downloaded from a NOAA ftp site that is no longer available in this form. (ftp site used to be eclipse.ncdc.noaa.gov/pub/OI-daily-v2/NetCDF/1981/AVHRR/).
We read the data by giving read_stars
a vector with character names:
x = c( "avhrr-only-v2.19810901.nc", "avhrr-only-v2.19810902.nc", "avhrr-only-v2.19810903.nc", "avhrr-only-v2.19810904.nc", "avhrr-only-v2.19810905.nc", "avhrr-only-v2.19810906.nc", "avhrr-only-v2.19810907.nc", "avhrr-only-v2.19810908.nc", "avhrr-only-v2.19810909.nc" ) # see the second vignette: # install.packages("starsdata", repos = "http://pebesma.staff.ifgi.de", type = "source") file_list = system.file(paste0("netcdf/", x), package = "starsdata") (y = read_stars(file_list, quiet = TRUE))
Next, we select sea surface temperature (sst
), and drop the singular zlev
(depth) dimension using adrop
:
library(dplyr) library(abind) z <- y |> select(sst) |> adrop()
We can now graph the sea surface temperature (SST) using ggplot
, which needs data in a long table form, and without units:
# convert POSIXct time to character, to please ggplot's facet_wrap() z1 = st_set_dimensions(z, 3, values = as.character(st_get_dimension_values(z, 3))) library(ggplot2) library(viridis) library(ggthemes) ggplot() + geom_stars(data = z1[1], alpha = 0.8, downsample = c(10, 10, 1)) + facet_wrap("time") + scale_fill_viridis() + coord_equal() + theme_map() + theme(legend.position = "bottom") + theme(legend.key.width = unit(2, "cm"))
We can write a stars object to disk by using write_stars
; this used the GDAL write engine. Writing NetCDF files without going through the GDAL interface is currently not supported.
write_stars
currently writes only a single attribute:
write_stars(adrop(y[1]), "sst.tif")
See the explanation of merge
above to see how multiple attributes
can be merged (folded) into a dimension.
Using a curvilinear grid, taken from the example of read_ncdf
:
prec_file = system.file("nc/test_stageiv_xyt.nc", package = "stars") prec = read_ncdf(prec_file, curvilinear = c("lon", "lat")) ##plot(prec) ## gives error about unique breaks ## remove NAs, zeros, and give a large number ## of breaks (used for validating in detail) qu_0_omit = function(x, ..., n = 22) { if (inherits(x, "units")) x = units::drop_units(na.omit(x)) c(0, quantile(x[x > 0], seq(0, 1, length.out = n))) } library(dplyr) # loads slice generic prec_slice = slice(prec, index = 17, along = "time") plot(prec_slice, border = NA, breaks = qu_0_omit(prec_slice[[1]]), reset = FALSE) nc = sf::read_sf(system.file("gpkg/nc.gpkg", package = "sf"), "nc.gpkg") plot(st_geometry(nc), add = TRUE, reset = FALSE, col = NA, border = 'red')
We can now crop the grid to those cells falling in
nc = st_transform(nc, st_crs(prec_slice)) # datum transformation plot(prec_slice[nc], border = NA, breaks = qu_0_omit(prec_slice[[1]]), reset = FALSE) plot(st_geometry(nc), add = TRUE, reset = FALSE, col = NA, border = 'red')
The selection prec_slice[nc]
essentially calls st_crop(prec_slice, nc)
to get a cropped selection. What happened here is that all
cells not intersecting with North Carolina (sea) are set to NA
values. For regular grids, the extent of the resulting stars
object is also be reduced (cropped) by default; this can be
controlled with the crop
parameter to st_crop
and [.stars
.
Like tbl_cube
, stars
arrays have no limits to the number of dimensions they handle. An example is the origin-destination (OD) matrix, by time and travel mode.
We create a 5-dimensional matrix of traffic between regions, by day, by time of day, and by travel mode. Having day and time of day each as dimension is an advantage when we want to compute patterns over the day, for a certain period.
nc = st_read(system.file("gpkg/nc.gpkg", package="sf")) to = from = st_geometry(nc) # 100 polygons: O and D regions mode = c("car", "bike", "foot") # travel mode day = 1:100 # arbitrary library(units) units(day) = as_units("days since 2015-01-01") hour = set_units(0:23, h) # hour of day dims = st_dimensions(origin = from, destination = to, mode = mode, day = day, hour = hour) (n = dim(dims)) traffic = array(rpois(prod(n), 10), dim = n) # simulated traffic counts (st = st_as_stars(list(traffic = traffic), dimensions = dims))
This array contains the simple feature geometries of origin and destination so that we can directly plot every slice without additional table joins. If we want to represent such an array as a tbl_cube
, the simple feature geometry dimensions need to be replaced by indexes:
st |> as.tbl_cube()
The following demonstrates how we can use dplyr
to filter travel mode bike
, and compute mean bike traffic by hour of day:
b <- st |> as.tbl_cube() |> filter(mode == "bike") |> group_by(hour) |> summarise(traffic = mean(traffic)) |> as.data.frame() require(ggforce) # for plotting a units variable ggplot() + geom_line(data = b, aes(x = hour, y = traffic))
Data cube values at point location can be extracted by st_extract
, an example is found
in vignette 7
Aggregates, such as mean, maximum or modal values can be obtained by aggregate
. In this
example we use a categorical raster, and try to find the modal (most frequent) class within
two circular polygons:
s = system.file("tif/lc.tif", package = "stars") r = read_stars(s, proxy = FALSE) |> droplevels() levels(r[[1]]) = abbreviate(levels(r[[1]]), 10) # shorten text labels st_point(c(3190631, 3125)) |> st_sfc(crs = st_crs(r)) |> st_buffer(25000) -> pol1 st_point(c(3233847, 21027)) |> st_sfc(crs = st_crs(r)) |> st_buffer(10000) -> pol2 if (isTRUE(dev.capabilities()$rasterImage == "yes")) { plot(r, reset = FALSE, key.pos = 4) plot(c(pol1, pol2), col = NA, border = c('yellow', 'green'), lwd = 2, add = TRUE) }
To find the modal value, we need a function that gives back the label
corresponding to the class which is most frequent, using table
:
f = function(x) { tb = table(x); names(tb)[which.max(tb)] }
We can then call aggregate
on the raster map, and the set of the two
circular polygons pol1
and pol2
, and pass the function f
:
aggregate(r, c(pol1, pol2), f) |> st_as_sf()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.