fill_akqenv_parallel: Fill an R Environment by Parallel Processing with...

Description Usage Arguments Value Note Author(s) See Also Examples

Description

Compute Asquith–Knight discharge decay analyses for a succession of USGS streamgage identification numbers in an R environment from fill_dvenv and populates with output from akqdecay as another R environment. This function is intended to be used as an alternative to fill_akqenv on unix-like platforms only. The mcparallelDo package is used, though this is not a required package for akqdecay. This function, thus, will only operate properly if mcparallelDo is in fact installed. The process is setup to parallel no more than six sites at a time, wait until all six jobs are finished, and then launch another ensemble. This might not be ideal because if a long-record site is among four very short-record sites, then there are fractions of time without all potential jobs running. However, the design of fill_akqenv_parallel has proven to be effective, robust, and yields much acceleration benchmarked against fill_akqenv.

Usage

1
fill_akqenv_parallel(sites=NULL, dvenv=NULL, envir=NULL, silent=FALSE, ...)

Arguments

sites

The vector of sites within dvenv for processing. This option means that a massive environment of daily values can be retained in the user's space but Asquith–Knight discharge decay analyses can be restricted to a smaller subset of sites. If sites=NULL, then internally all of the sites within dvenv will be processed, and this would be almost universally the way to use this function;

dvenv

An R environment previously populated by fill_dvenv;

envir

A user created (usually) R environment by new.env();

silent

Suppress informative calls to message(); and

...

Additional arguments to pass to control akqdecay.

Value

This function is used for its side effects on the envir argument but does return a count of the sites processed by akqdecay.

Note

System resources have the potential to be taxed to limits as thousands of zombies inflate the process table and consume resources. The mcparallelDo package (version 1.1.0) is used for parallel processing. However, the implementation within the fill_akqdenv_parallel function, whether by lack of fully understanding of mcparallelDo features or not, results in zombies left on the process table for each process fork. The zombies are harmless per se when the number of sites processed is relatively small, but the zombies can consume a lot to all physical memory resources when processing repeatedly during testing or during real-world massive processing (thousands of sites each with say many tens of hundred thousands of daily-mean streamflows). The zombies remain until the parent R session is closed, and to migigate, a tracking (accounting) of the PID is seems to be required.

Exploration of code and communication with the author, indicates that mcparallelDo does not preserve the PID of forked R sessions through is use of a jobName—that is, mcparallelDo does not consult the PID issued from its own calls to the parallel package. The following changes to mcparallelDo/R/mcparallelDo.R within the mcparallelDo() function is made to store the PID under the jobName. The changes are shown in the syntax of the unix-like diff (compare files line by line) system function:

1
2
3
4
5
6
7
8
  187,189c187,189
  <   jobName <- R.utils::tempvar(".mcparallelDoJob",
  <                               value = parallel::mcparallel({try(code)}),
  <                               envir = targetEnvironment)
  ---
  >   p <- parallel::mcparallel({try(code)})
  >   jobName <- R.utils::tempvar(paste0(".mcparallelDoJob","-",p$pid,"-"),
  >                               value = p, envir = targetEnvironment)

The modification shown alters lines 187–189 to change the prefix of a temporary variable name by appending a hypen, the PID, and another hyphen to the “.mcparallelDoJob” root. The function mcparallelDo::mcparallelDo() is run in its verbose mode in order to return the temporary jobName. A killZombies() function is defined inside fill_akqdenv_parallel. After all of the six spawned processes complete, that function uses the stored job names, which is technically after the point that mcparallelDo has already deleted these temporary variables. The killZombies() function extracts the PIDs from the job name, and six R system() calls are made with unix-like kill PID to kill each of the zombies.

Author(s)

W.H. Asquith

See Also

fill_akqenv

Examples

1
2
3
4
5
6
7
## Not run: 
# See fill_dvenv() Examples for the creation of wolf.env used here.
akqwolf.env1 <- new.env() # the standard declaration of an environment
akqwolf.env2 <- new.env() # the standard declaration of an environment
system.time(fill_akqenv(         dvenv=wolf.env, envir=akqwolf.env1))
system.time(fill_akqenv_parallel(dvenv=wolf.env, envir=akqwolf.env2)) #
## End(Not run)

wasquith-usgs/akqdecay documentation built on Nov. 9, 2020, 1:13 p.m.