parGLM-methods: fit GLM-like models with parallelized contributions to...

Description Methods Examples

Description

This package addresses the problem of fitting GLM-like models in a scalable way, recognizing that data may be dispersed, with chunks processed in parallel, to create low-dimensional summaries from which model fits may be constructed.

Methods

signature(formula = "formula", store = "Registry")

The model data are assumed to lie in the file.dir/jobs/* folders, with file.dir defined in the store, which is an instance of Registry.

Additional arguments must be supplied:

family

a function that serves as a family for stats::glm

binit

a vector of initial values for regression parameter estimation, must conform to expectations of formula

maxit

an integer giving the maximum number of iterations allowed

tol

a numeric giving the tolerance criterion

Failure to specify these triggers a fatal error.

The Registry instance can be modified to include a list element 'extractor'. This must be a function with arguments store, and codei. The standard extraction function is

function(store, i) loadResult(store, i)

It must return a data frame, conformant with the expectations of formula. Limited checking is performed.

The predict method computes the linear predictor on data identified by jobid in a BatchJobs registry. Results are returned as output of foreach over the jobids specified in the predict call.

Note that setting option parGLM.showiter to TRUE will provide a message tracing progress of the optimization.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
if (require(MASS) & require(BatchJobs)) {
# here is the 'sharding' of a small dataset
 data(anorexia)  # N = 72
# in .BatchJobs.R:
# best setting for sharding a small dataset on a small machine:
# cluster.functions = BatchJobs::makeClusterFunctionsInteractive()
 myr = makeRegistry("abc", file.dir=tempfile())
 chs = chunk(1:nrow(anorexia), n.chunks=18) # 4 recs/chunk
 f = function(x) {library(MASS); data(anorexia); anorexia[x,]}
 batchMap(myr, f, chs)
 submitJobs(myr) # now getResult(myr,1) gives back a data.frame
 waitForJobs(myr) # simple dispersal
# now myr is populated
 oldopt = options()$parGLM.showiter
 options(parGLM.showiter=TRUE)
 pp = parGLM( Postwt ~ Treat + Prewt, myr,
   family=gaussian, binit = c(0,0,0,0), maxit=10, tol=.001 )
 print(summary(theLM <- lm(Postwt~Treat+Prewt, data=anorexia)))
 print(pp$coefficients - coef(theLM))
 if (require(sandwich)) {
   hc0 <- vcovHC(theLM, type="HC0")
   print(pp$robust.variance - hc0)
   }
 }
 predict(pp, store=myr, jobids=2:3)
 options(parGLM.showiter=oldopt)

Example output

Loading required package: MASS
Loading required package: BatchJobs
Loading required package: BBmisc
The development of BatchJobs and BatchExperiments is discontinued.
Consider switching to 'batchtools' for new features and improved stability
Sourced 1 configuration files: 
  1: /usr/lib/R/site-library/BatchJobs/etc/BatchJobs_global_config.R
BatchJobs configuration:
  cluster functions: Interactive
  mail.from: 
  mail.to: 
  mail.start: none
  mail.done: none
  mail.error: none
  default.resources: 
  debug: FALSE
  raise.warnings: FALSE
  staged.queries: TRUE
  max.concurrent.jobs: Inf
  fs.timeout: NA
  measure.mem: TRUE

Creating dir: /work/tmp/tmp/Rtmp0Tyelf/file37ab7965a2ab
Saving registry: /work/tmp/tmp/Rtmp0Tyelf/file37ab7965a2ab/registry.RData
Adding 18 jobs to DB.
Saving conf: /work/tmp/tmp/Rtmp0Tyelf/file37ab7965a2ab/conf.RData
Submitting 18 chunks / 18 jobs.
Cluster functions: Interactive.
Auto-mailer settings: start=none, done=none, error=none.

SubmitJobs |+                                                |   0% (00:00:00)
SubmitJobs |+                                                |   0% (00:00:00)
SubmitJobs |+++                                              |   6% (00:00:00)
SubmitJobs |+++++                                            |  11% (00:00:00)
SubmitJobs |++++++++                                         |  17% (00:00:00)
SubmitJobs |+++++++++++                                      |  22% (00:00:00)
SubmitJobs |++++++++++++++                                   |  28% (00:00:02)
SubmitJobs |++++++++++++++++                                 |  33% (00:00:02)
SubmitJobs |+++++++++++++++++++                              |  39% (00:00:01)
SubmitJobs |++++++++++++++++++++++                           |  44% (00:00:01)
SubmitJobs |++++++++++++++++++++++++                         |  50% (00:00:01)
SubmitJobs |+++++++++++++++++++++++++++                      |  56% (00:00:01)
SubmitJobs |++++++++++++++++++++++++++++++                   |  61% (00:00:01)
SubmitJobs |+++++++++++++++++++++++++++++++++                |  67% (00:00:01)
SubmitJobs |+++++++++++++++++++++++++++++++++++              |  72% (00:00:00)
SubmitJobs |++++++++++++++++++++++++++++++++++++++           |  78% (00:00:00)
SubmitJobs |+++++++++++++++++++++++++++++++++++++++++        |  83% (00:00:00)
SubmitJobs |++++++++++++++++++++++++++++++++++++++++++++     |  89% (00:00:00)
SubmitJobs |++++++++++++++++++++++++++++++++++++++++++++++   |  94% (00:00:00)
SubmitJobs |+++++++++++++++++++++++++++++++++++++++++++++++++| 100% (00:00:00)
Sending 18 submit messages...
Might take some time, do not interrupt this!
Syncing registry ...
Warning in result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
  Don't need to call dbFetch() for statements, only for queries
Warning: executing %dopar% sequentially: no parallel backend registered
iteration 0: criterion value = 1
iteration 1: criterion value = 2.89064205883374e-14

Call:
lm(formula = Postwt ~ Treat + Prewt, data = anorexia)

Residuals:
     Min       1Q   Median       3Q      Max 
-14.1083  -4.2773  -0.5484   5.4838  15.2922 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  49.7711    13.3910   3.717  0.00041 ***
TreatCont    -4.0971     1.8935  -2.164  0.03400 *  
TreatFT       4.5631     2.1333   2.139  0.03604 *  
Prewt         0.4345     0.1612   2.695  0.00885 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.978 on 68 degrees of freedom
Multiple R-squared:  0.2777,	Adjusted R-squared:  0.2458 
F-statistic: 8.713 on 3 and 68 DF,  p-value: 5.719e-05

                     [,1]
(Intercept) -2.842171e-14
TreatCont    2.664535e-15
TreatFT     -1.776357e-15
Prewt        4.996004e-16
Loading required package: sandwich
              (Intercept)     TreatCont       TreatFT         Prewt
(Intercept) -7.759127e-12 -3.379075e-12 -1.425526e-13  7.016610e-14
TreatCont   -3.383516e-12  9.592327e-14 -1.643130e-14  4.092213e-14
TreatFT     -1.025846e-13 -1.709743e-14  1.776357e-15  1.332268e-15
Prewt        8.304468e-14  4.135234e-14  2.595146e-15 -7.181755e-16
There were 40 warnings (use warnings() to see them)
[[1]]
      [,1]
5 79.60546
6 84.03696
7 83.60250
8 78.30208

[[2]]
       [,1]
9  80.69161
10 79.73580
11 79.38823
12 84.21075

parglms documentation built on Nov. 8, 2020, 5:51 p.m.