knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
In order to be able to work with the whole database at the sapwood or plant level it is recommended at least $16GB$ of RAM memory. This is because loading all data objects already consumes $4GB$ and any operation like aggregation or metric calculation results in extra memory needed:
library(sapfluxnetr) # This will need at least 5GB of memory during the process folder <- 'RData/plant' sfn_metadata <- read_sfn_metadata(folder) daily_results <- sfn_sites_in_folder(folder) %>% filter_sites_by_md( si_biome %in% c("Temperate forest", 'Woodland/Shrubland'), sites = sites, metadata = sfn_metadata ) %>% read_sfn_data(folder) %>% daily_metrics(tidy = TRUE, metadata = sfn_metadata) # Important to save, this way you will have access to the object in the future save(daily_results, file = 'daily_results.RData')
To circumvent this in less powerful systems, we recommend to work in small subsets of sites (25-30) and join the tidy results afterwards:
library(sapfluxnetr) folder <- 'RData/plant' metadata <- read_sfn_metadata(folder) sites <- sfn_sites_in_folder(folder) %>% filter_sites_by_md( si_biome %in% c("Temperate forest", 'Woodland/Shrubland'), sites = sites, metadata = sfn_metadata ) daily_results_1 <- read_sfn_data(sites[1:30], folder) %>% daily_metrics(tidy = TRUE, metadata = sfn_metadata) daily_results_2 <- read_sfn_data(sites[31:60], folder) %>% daily_metrics(tidy = TRUE, metadata = sfn_metadata) daily_results_3 <- read_sfn_data(sites[61:90], folder) %>% daily_metrics(tidy = TRUE, metadata = sfn_metadata) daily_results_4 <- read_sfn_data(sites[91:110], folder) %>% daily_metrics(tidy = TRUE, metadata = sfn_metadata) daily_results_steps <- bind_rows( daily_results_1, daily_results_2, daily_results_3, daily_results_4 ) rm(daily_results_1, daily_results_2, daily_results_3, daily_results_4) save(daily_results_steps, file = 'daily_results_steps.RData')
sapfluxnetr
includes the capability to parallelize the metrics calculation
when performed on a sfn_data_multi
object. This is made thenks to the
furrr package, which uses the
future package behind the scenes.
By default, the code will run in a sequential process, which is the usual way
the R code runs. But setting the future::plan
to multicore
(in Linux),
multisession
(in Windows) or multiprocess
(automatically choose between the
previous plans depending on the system) will run the code in parallel, dividing
the sites between the available cores.
> Be advised, parallelization usually means more RAM used, so in systems with less then 16GB maybe is not a good idea. Also, the time benefits start to show when analysing 10 sites or more.
# loading future package library(future) # setting the plan plan('multiprocess') # metrics!! daily_results_parallel <- sfn_sites_in_folder(folder) %>% filter_sites_by_md( si_biome %in% c("Temperate forest", 'Woodland/Shrubland'), sites = sites, metadata = sfn_metadata ) %>% read_sfn_data(folder) %>% daily_metrics(tidy = TRUE, metadata = sfn_metadata) # Important to save, this way you will have access to the object in the future save(daily_results_parallel, file = 'daily_results_parallel.RData')
When using furrr
, even in the sequential
plan, the future
package sets
a limit of $500MB$ for each core. With sapfluxnet data this limit is easily
exceeded, causing an error. To avoid this we may want to set the
future.globals.maxSize
limit to a higher value ($1GB$ for example, but the
limit wanted really depend on the plan and the number of sites):
# future library library(future) # plan sequential, not really needed, as it is the default, but for the sake of # clarity plant('sequential') # up the limit to 1GB, this in bytes is 1014*1024^2 options('future.globals.maxSize' = 1014*1024^2) # do the metrics daily_results_limit <- sfn_sites_in_folder(folder) %>% filter_sites_by_md( si_biome %in% c("Temperate forest", 'Woodland/Shrubland'), sites = sites, metadata = sfn_metadata ) %>% read_sfn_data(folder) %>% daily_metrics(tidy = TRUE, metadata = sfn_metadata) # Important to save, this way you will have access to the object in the future save(daily_results_limit, file = 'daily_results_limit.RData')
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.