knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) eval_chunks <- CopernicusMarine:::has_blosc && curl::has_internet() && CopernicusMarine::cms_get_password() != "" && sf::st_drivers("raster", "^HDF5$")$vsi
Using stars_proxy objects
in combination with the CopernicusMarine package introduces opportunities to
efficiently work with Data from Copernicus Marine services on the fly.
The great thing about these proxy objects is that they will not read any data unless it is needed. So, you can connect to a dataset from the Copernicus server without having to read raster data. Instead it will only collect meta data about the raster's dimensions and bands (attributes). The actual raster data is only downloaded when you need it.
You can either set up a proxy object by calling cms_native_proxy() or
cms_zarr_proxy(). The first uses the 'native' service. In this case the
data is already structured in chunked files and the added value of proxy objects
is not that obvious. Therefore, in this vignette, we will focus on objects created
with cms_zarr_proxy(). It will connect with an entire layer in a product.
library(CopernicusMarine) library(stars, quietly = TRUE) my_proxy_gc <- cms_zarr_proxy( product = "GLOBAL_ANALYSISFORECAST_PHY_001_024", layer = "cmems_mod_glo_phy-thetao_anfc_0.083deg_P1D-m", asset = "geoChunked") my_proxy_tc <- cms_zarr_proxy( product = "GLOBAL_ANALYSISFORECAST_PHY_001_024", layer = "cmems_mod_glo_phy-thetao_anfc_0.083deg_P1D-m", asset = "timeChunked") print(my_proxy_tc)
The only downside from working with proxy objects is that you need
to know which asset type you wish to use. When subsetting data with
cms_download_subset() this selection is automated based on your
selection criteria. But when working with a proxy object, you may not
know which slices you wish to select in advance.
In general, if you wish to work with long time-series in a small geographical
area, it's most efficient to work with "geoChunked" data. Whereas, if
you want to work with a short time period, but on a large geographical scale,
it is better to use "timeChunked" data.
As you can see from the proxy object printed above, it has dimension that stretch pretty far. It has daily data for nearly four years, in 50 depth layers with global coverage. If you would try to read this raster data, it will almost certainly fail as it would require thousands of Gb of memory which is simply not available on most devices.
Fortunately, the proxy object can easily be sliced, by selecting index values with
the bracket operator ([). The first index represents the band (attribute),
and we skip it, next are the x and y coordinate, followed by the elevation.
The last dimension is time, were we select the first four hundred records.
time_slice <- my_proxy_gc[,2000, 1000, 48, 1:400] show(time_slice)
As you can notice, this slicing is super fast. This is because no actual data
is transfered yet.
It isn't until st_as_stars() is called when the data is downloaded.
Since in this particular case we have only selected a single raster cell,
it makes sense to cast the object to a data.frame. We can then plot the
time series.
time_slice <- st_as_stars(time_slice) plot(st_get_dimension_values(time_slice, "time"), time_slice$thetao, xlab = "date", ylab = "temperature", type = "l")
We can also select a specific area, for which we will use the time chunked proxy.
geo_slice <- my_proxy_tc[,2000:2500, 1500:1750, 50, 500] plot(geo_slice, col = hcl.colors(10))
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.