knitr::opts_chunk$set(echo = TRUE)
This vignette describes a basic application of the RFmerge
function to create an improved precipitation dataset by combining two different satellite-based precipitation products with ground-based observations and user-selected covariates (i.e., digital elevation model and Euclidean distances to rain gauge stations).
We use as case study Valparaiso (Chile), as an example of how to generate this product at daily temporal scale and at $0.05^\circ$ spatial resolution, from January to August 1983. This example requires the following data from the user:
i) time series of rainfall observations,
ii) metadata describinf the spatial coordiantes of the rain gauges,
iii) the Climate Hazards Group InfraRed Precipitation with Station data version 2.0 (CHIRPSv2),
iv) the Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks - Climate Data Record (PERSIANN-CDR), and
v) the Shuttle Radar Topography Mission (SRTM-v4) digital elevation model (DEM).
In addition, Euclidean distances are also used as covariates, but they are automatically computed within RFmerge
as the Eucalidean distances from each rain gauge station to every grid-cell within the study area.
Install the latest stable version (from CRAN):
install.packages("RFmerge")
Alternatively, you can also try the under-development version from Github:
if (!require(devtools)) install.packages("devtools") library(devtools) install_github("hzambran/RFmerge")
\newpage
library(zoo) library(sf) library(rgdal) library(raster)
RFmerge
package, which contains the main function used in the analysis and required datasets:library(RFmerge)
First, daily time series of rainfall observations in 34 rain gauges located in Valparaiso will be used for this example, from 1983-01-01 to 1983-08-31, which are available in the ValparaisoPPts dataset provided in the RFmerge
package (for your own application, this dataset might be read from a CSV file or a zoo
file). In addition, the ValparaisoPPgis dataset contains information about the IDs and spatial coordinates of each one of the rain gauges (for your own application, this dataset might be read from a CSV file). Finally, ValparaisoSHP stores the sf
polygon defining the outer borders of the study area (for your own application, this dataset might be read from an ESRI shapefile).
data(ValparaisoPPts) data(ValparaisoPPgis) data(ValparaisoSHP)
Secondly, we need to load the satellit-based precipitation datasets and other covariates. For this example, CHIRPSv2 (Funk et al., 2015) and PERSIANN-CDR (Ashouri et al.,2015), at a spatial resolution of $0.05^\circ$, are used as dynamic covariates (in this context, dynamic means time-varying covariates). The Digital Elevation Model SRTM-v4 (DEM), also with a spatial resolution of $0.05^\circ$, is used as a static covariate to account for the impact of elevation on precipitation(in this context, static means that this covariate does not change in time).
chirps.fname <- system.file("extdata/CHIRPS5km.tif",package="RFmerge") prsnncdr.fname <- system.file("extdata/PERSIANNcdr5km.tif",package="RFmerge") dem.fname <- system.file("extdata/ValparaisoDEM5km.tif",package="RFmerge") CHIRPS5km <- brick(chirps.fname) PERSIANNcdr5km <- brick(prsnncdr.fname) ValparaisoDEM5km <- raster(dem.fname)
The two multi-band geotiff files provided in the RFmerge
package do not store the date of the precipitation estimate as name of the corresponding layer. Therefore, before any exploratory analysis, we would give meaningful names to each band (layer) in CHIRPS5km
and PERSIANNcdr5km
:
#ldates <- hydroTSM::dip("1983-01-01", "1983-08-31") ldates <- seq(from=as.Date("1983-01-01"), to=as.Date("1983-08-31"), by="days") names(CHIRPS5km) <- ldates names(PERSIANNcdr5km) <- ldates
Then, we want to visualise the first six rows of the spatial metadata:
head(ValparaisoPPgis)
Plotting the daily precipitation time series for the first station (code: P5101005).
main <- paste("Daily precipitation for the station", ValparaisoPPgis$Code[1]) ylab <- "Precipitation [mm]" x.ts <- ValparaisoPPts[,1] #hydroTSM::hydroplot(x.ts, pfreq="o", main=main, ylab= ylab) plot(x.ts, main=main, ylab= ylab, col="blue") grid()
Plotting the accumulated precipitation estimates for the first eight months of 1983 from CHIRPS and PERSIANN-CDR, and overlying the boundaries of the study area (only its first attribute):
chirps.total <- sum(CHIRPS5km, na.rm= FALSE) persiann.total <- sum(PERSIANNcdr5km, na.rm= FALSE) plot(chirps.total, main = "CHIRPSv2 [Jan - Aug] ", xlab = "Longitude", ylab = "Latitude") plot(ValparaisoSHP[1], add=TRUE, col="transparent") plot(persiann.total, main = "PERSIANN-CDR [Jan - Aug]", xlab = "Longitude", ylab = "Latitude") plot(ValparaisoSHP[1], add=TRUE, col="transparent")
In order to use the spatial information stored in ValparaisoPPgis
, we first need to convert it into a SpatialPointsDataFrame
, using the latitude and longitude fields, stored in the lat
and lon
columns:
stations <- ValparaisoPPgis ( stations <- st_as_sf(stations, coords = c('lon', 'lat'), crs = 4326) )
Now, we can plot the DEM with the locations of the rain gauge stations that will be used in the computation of RF-MEP, and overlying the boundaries of the study area and the stations (only their first attribute):
plot(ValparaisoDEM5km, main="SRTM-v4", xlab="Longitude", ylab="Latitude", col=terrain.colors(255)) plot(ValparaisoSHP[1], add=TRUE, col="transparent") plot(stations[1], add=TRUE, pch = 16, col="black")
Note that for this example we want to produce a merged precipitation product only from January 1st to August 31th in 1983, i.e., 243 days; therefore, the precipitation products used as covariates have 243 layers each (one for each day in the time period used ofr the analysis), which is the same number of rows of the ValparaisoPPts
object.
nlayers(CHIRPS5km) ( nlayers(CHIRPS5km) == nlayers(PERSIANNcdr5km) ) ( nlayers(CHIRPS5km) == nrow(ValparaisoPPts) )
Now, we have to verify that the precipitation products and the DEM have the same spatial extent:
extent(CHIRPS5km) ( extent(CHIRPS5km) == extent(PERSIANNcdr5km) ) ( extent(CHIRPS5km) == extent(ValparaisoDEM5km) )
and the same spatial resolution:
res(CHIRPS5km) ( res(CHIRPS5km) == res(PERSIANNcdr5km) ) ( res(CHIRPS5km) == res(ValparaisoDEM5km) )
If you would like to test the use of Euclidean distances as covariates in RFmerge
you need to be sure that Euclidean distances will be correctly computed from your datasets.
Because ValparaisoPPgis
, CHIRPS5km
, and PERSIANNcdr5km
all use geographical coordinates, we need first to project their values into a projected coordinate reference system (CRS), i.e., one defined on a flat, two-dimensional surface.
First, we reproject the rainfall observations from geographic coordinates into WGS 84 / UTM zone 19S (EPSG:32719):
stations.utm <- sf::st_transform(stations, crs=32719) # for 'sf' objects
Second, we reproject the satellite products from geographic coordinates into WGS 84 / UTM zone 19S (EPSG:32719):
#utmz19s.p4s <- CRS("+init=epsg:32719") # WGS 84 / UTM zone 19S utmz19s.p4s <- sf::st_crs(stations.utm)$proj4string # WGS 84 / UTM zone 19S CHIRPS5km.utm <- projectRaster(from=CHIRPS5km, crs=utmz19s.p4s) PERSIANNcdr5km.utm <- projectRaster(from=PERSIANNcdr5km, crs=utmz19s.p4s) ValparaisoDEM5km.utm <- projectRaster(from=ValparaisoDEM5km, crs=utmz19s.p4s)
Third, we reproject the polygon with the boundaries used to define the study area from geographic coordinates into WGS 84 / UTM zone 19S (EPSG:32719):
ValparaisoSHP.utm <- sf::st_transform(ValparaisoSHP, crs=32719)
Fourth, we create a new data.frame with the expected metadata, i.e., at least, ID, lat, lon:
st.coords <- st_coordinates(stations.utm) lon <- st.coords[, "X"] lat <- st.coords[, "Y"] ValparaisoPPgis.utm <- data.frame(ID=stations.utm[["Code"]], lon=lon, lat=lat)
You might want to skip (at your own risk) this reprojection step when the study area is small enough to neglect the impact of using geographic coordinates in the computation of Euclidean distances.
RFmerge
Now, we can create the covariates object to be used in the RFmerge
function. For doing this, we will create a list object with the different covariates. Please note that the order and name of the covariates in the list is not important.
covariates.utm <- list(chirps=CHIRPS5km.utm, persianncdr=PERSIANNcdr5km.utm, dem=ValparaisoDEM5km.utm)
Finally, if you want the resulting merged files be written into disk you need to define the output directory (drty.out
) before running RFmerge
. Then, you can run the RFmerge
function as follows:
Without using parallelisation (default option):
drty.out <- file.path(tempdir(), "Test.nop") rfmep <- RFmerge(x=ValparaisoPPts, metadata=ValparaisoPPgis.utm, cov=covariates.utm, id="ID", lat="lat", lon="lon", mask=ValparaisoSHP.utm, training=0.8, write2disk=TRUE, drty.out=drty.out)
Detecting if your OS is Windows or GNU/Linux, and setting the 'parallel' argument accordingly:
onWin <- ( (R.version$os=="mingw32") | (R.version$os=="mingw64") ) ifelse(onWin, parallel <- "parallelWin", parallel <- "parallel")
Using parallelisation, with a maximum number of nodes/cores to be used equal to 2:
par.nnodes <- min(parallel::detectCores()-1, 2) drty.out <- file.path(tempdir(), "Test.par") rfmep <- RFmerge(x=ValparaisoPPts, metadata=ValparaisoPPgis.utm, cov=covariates.utm, id="ID", lat="lat", lon="lon", mask=ValparaisoSHP.utm, training=0.8, write2disk=TRUE, drty.out=drty.out, parallel=parallel, par.nnodes=par.nnodes)
If RFmerge
run without problems, and write2disk=TRUE
, the following output files will be stored in your user-defined drty.out
directory:
i) the rain gauge stations used as training and evaluation datasets; and
ii) the final merged product (individual GeoTiff files).
The aforementioned resulting objects will be stored within drty.out
as follows:
Ground_based_data/Training/
: In this directory, time series and metadata used to train RF-MEP will be stored as a zoo
file (Training_ts.txt
), and a text
file (Training_metadata.txt
), respectively.
Ground_based_data/Evaluation/
: This directory will store time series and metadata not used in training the Random Forest model, but that are available for evaluating the performance of the results in an independent evaluation dataset. The Evaluation_ts.txt
and Evaluation_metadata.tx
files are stored as zoo
and CSV
files, respectively)
RF-MEP/
: This directory will store the individual GeoTiff files produced by the RF-MEP algorithmn, using the same spatial resolution as the selected covariates.
After running RFmerge
we will use the evaluation dataset of rain gauge observations to evaluate the performance of RF-MEP at observation not used in the merging procedure. For this purpose, we will create a stack with the obtained merged product and will import the time series and metadata from the evaluation set:
ts.path <- paste0(drty.out, "/Ground_based_data/Evaluation/Evaluation_ts.txt") metadata.path <- paste0(drty.out, "/Ground_based_data/Evaluation/Evaluation_metadata.txt") eval.ts <- read.zoo(ts.path, header = TRUE) eval.gis <- read.csv(metadata.path) # promoting 'eval.gis' into a spatial object (in order to be plotted) ( eval.gis.utm <- st_as_sf(eval.gis, coords = c('lon', 'lat'), crs = 32719) )
Visualisation of one day of precipitation using RF-MEP, and overlying the boundaries of the study area (only its first attribute)::
plot(rfmep[[11]], main="RF-MEP precipitation for 1983-01-11", xlab="Longitude", ylab="Latitude") plot(ValparaisoSHP.utm[1], add=TRUE, col="transparent") plot(eval.gis.utm, add=TRUE, pch = 16, col="black")
The total amount of precipitation over Valparaiso for January to August 1983 according to RF-MEP, and overlying the boundaries of the study area (only its first attribute)::
rfmep.total <- sum(rfmep, na.rm= FALSE) plot(rfmep.total, main = "RF-MEP [Jan - Aug]", xlab = "Longitude", ylab = "Latitude") plot(ValparaisoSHP.utm[1], add=TRUE, col="transparent")
First, we will compare RF-MEP (and the two precipitation products used as covariates) with the rain gauge data from the evaluation set. To extract the RF-MEP precipitation values at the gauge locations, we will use the extract
function from the raster
package:
coordinates(eval.gis) <- ~ lon + lat rfmep.ts <- t(raster::extract(rfmep, eval.gis)) chirps.ts <- t(raster::extract(CHIRPS5km.utm, eval.gis)) persiann.ts <- t(raster::extract(PERSIANNcdr5km.utm, eval.gis))
To evaluate the performance of RF-MEP and the products used in its computation, the NAsh-Sutcliffe efficiency (NSE) will be used. The optimal value for the NSE is one.
# Defining a function to compute the Nash-Sutcliffe efficiency(NSE) NSE <- function(sim, obs) return( 1 - (sum((obs - sim)^2)/ (sum((obs - mean(obs))^2)) ) ) sres <- list(chirps.ts, persiann.ts, rfmep.ts) nsres <- length(sres) nstations <- ncol(eval.ts) tmp <- rep(NA, nstations) nse.table <- data.frame(ID=eval.gis[["ID"]], CHIRPS=tmp, PERSIANN_CDR=tmp, RF_MEP=tmp) # Computing the NSE between the observed rainfall measured in each one of the raingauges # of the training dataset and CHIRPSv2, PERSIANN-CDR, the merged product `rfmep`: for (i in 1:nsres) { ldates <- time(eval.ts) lsim <- zoo(sres[[i]], ldates) nse.table[, (i+1)] = NSE(sim= lsim, obs= eval.ts) } # FOR end
Finally, a boxplot comparing the performance, in terms of KGE', of the merged product in comparison to the original SREs used as covariates will be produced:
# Boxplot with a graphical comparison sres.cols <- c("powderblue", "palegoldenrod", "mediumseagreen") boxplot(nse.table[,2:4], main = "NSE evaluation for Jan - Aug 1983", xlab = "P products", ylab = "NSE'", ylim = c(0, 1), # horizontal=TRUE, col=sres.cols) legend("topleft", legend=c("CHIRPS", "PERSIANN-CDR", "RF-MEP"), col=sres.cols, pch=15, cex=1.5, bty="n") grid()
In order to reduce the package dependencies for CRAN, this vignette was built with reduced functionality. The full vignette can be found here.
This tutorial was built under:
sessionInfo()$platform sessionInfo()$R.version$version.string paste("RFmerge", sessionInfo()$otherPkgs$RFmerge$Version)
\newpage
Ashouri, H., Hsu, K.-L., Sorooshian, S., Braithwaite, D. K., Knapp, K. R., Cecil, L. D., Nelson, B. R., and Prat, O. P. (2015). PERSIANN-CDR: Daily Precipitation Climate Data Record from Multisatellite Observations for Hydrological and Climate Studies, Bulletin of the American Meteorological Society, 96, 69–83, doi:10.1175/BAMS-D-13-00068.1.
Baez-Villanueva, O. M.; Zambrano-Bigiarini, M.; Beck, H.; McNamara, I.; Ribbe, L.; Nauditt, A.; Birkel, C.; Verbist, K.; Giraldo-Osorio, J.D.; Thinh, N.X. (2020). RF-MEP: a novel Random Forest method for merging gridded precipitation products and ground-based measurements, Remote Sensing of Environment, 239, 111610. doi:10.1016/j.rse.2019.111606.
Funk, C., Peterson, P., Landsfeld, M., Pedreros, D., Verdin, J., Shukla, S., Husak, G., Rowland, J., Harrison, L., Hoell, A., and Michaelsen, J. (2015) The climate hazards infrared precipitation with stations-a new environmental record for monitoring extremes, Sci Data, 2, 150 066, doi:10.1038/sdata.2015.66.
Hengl, T., Nussbaum, M., Wright, M. N., Heuvelink, G. B., & Gr\"{a}ler, B. (2018). Random forest as a generic framework for predictive modeling of spatial and spatio-temporal variables. PeerJ, 6, e5518.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.