This package addresses the problem of fitting GLM-like models in a scalable way, recognizing that data may be dispersed, with chunks processed in parallel, to create low-dimensional summaries from which model fits may be constructed.
signature(formula = "formula", store = "Registry")
The model data are assumed to lie in the file.dir/jobs/*
folders, with file.dir
defined in the store
, which is
an instance of Registry
.
Additional arguments must be supplied:
a function that serves as a family for stats::glm
a vector of initial values for regression
parameter estimation, must conform to expectations of formula
an integer giving the maximum number of iterations allowed
a numeric giving the tolerance criterion
Failure to specify these triggers a fatal error.
The Registry instance can be modified to include a list element
'extractor'. This must be a function with arguments store
, and
codei. The standard extraction function is
function(store, i) loadResult(store, i)
It must return a data frame, conformant with the expectations of formula
.
Limited checking is performed.
The predict method computes the linear predictor on data identified by jobid in a BatchJobs registry. Results are returned as output of foreach over the jobids specified in the predict call.
Note that setting option parGLM.showiter to TRUE will provide a message tracing progress of the optimization.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | if (require(MASS) & require(BatchJobs)) {
# here is the 'sharding' of a small dataset
data(anorexia) # N = 72
# in .BatchJobs.R:
# best setting for sharding a small dataset on a small machine:
# cluster.functions = BatchJobs::makeClusterFunctionsInteractive()
myr = makeRegistry("abc", file.dir=tempfile())
chs = chunk(1:nrow(anorexia), n.chunks=18) # 4 recs/chunk
f = function(x) {library(MASS); data(anorexia); anorexia[x,]}
batchMap(myr, f, chs)
submitJobs(myr) # now getResult(myr,1) gives back a data.frame
waitForJobs(myr) # simple dispersal
# now myr is populated
oldopt = options()$parGLM.showiter
options(parGLM.showiter=TRUE)
pp = parGLM( Postwt ~ Treat + Prewt, myr,
family=gaussian, binit = c(0,0,0,0), maxit=10, tol=.001 )
print(summary(theLM <- lm(Postwt~Treat+Prewt, data=anorexia)))
print(pp$coefficients - coef(theLM))
if (require(sandwich)) {
hc0 <- vcovHC(theLM, type="HC0")
print(pp$robust.variance - hc0)
}
}
predict(pp, store=myr, jobids=2:3)
options(parGLM.showiter=oldopt)
|
Loading required package: MASS
Loading required package: BatchJobs
Loading required package: BBmisc
The development of BatchJobs and BatchExperiments is discontinued.
Consider switching to 'batchtools' for new features and improved stability
Sourced 1 configuration files:
1: /usr/lib/R/site-library/BatchJobs/etc/BatchJobs_global_config.R
BatchJobs configuration:
cluster functions: Interactive
mail.from:
mail.to:
mail.start: none
mail.done: none
mail.error: none
default.resources:
debug: FALSE
raise.warnings: FALSE
staged.queries: TRUE
max.concurrent.jobs: Inf
fs.timeout: NA
measure.mem: TRUE
Creating dir: /work/tmp/tmp/Rtmp0Tyelf/file37ab7965a2ab
Saving registry: /work/tmp/tmp/Rtmp0Tyelf/file37ab7965a2ab/registry.RData
Adding 18 jobs to DB.
Saving conf: /work/tmp/tmp/Rtmp0Tyelf/file37ab7965a2ab/conf.RData
Submitting 18 chunks / 18 jobs.
Cluster functions: Interactive.
Auto-mailer settings: start=none, done=none, error=none.
SubmitJobs |+ | 0% (00:00:00)
SubmitJobs |+ | 0% (00:00:00)
SubmitJobs |+++ | 6% (00:00:00)
SubmitJobs |+++++ | 11% (00:00:00)
SubmitJobs |++++++++ | 17% (00:00:00)
SubmitJobs |+++++++++++ | 22% (00:00:00)
SubmitJobs |++++++++++++++ | 28% (00:00:02)
SubmitJobs |++++++++++++++++ | 33% (00:00:02)
SubmitJobs |+++++++++++++++++++ | 39% (00:00:01)
SubmitJobs |++++++++++++++++++++++ | 44% (00:00:01)
SubmitJobs |++++++++++++++++++++++++ | 50% (00:00:01)
SubmitJobs |+++++++++++++++++++++++++++ | 56% (00:00:01)
SubmitJobs |++++++++++++++++++++++++++++++ | 61% (00:00:01)
SubmitJobs |+++++++++++++++++++++++++++++++++ | 67% (00:00:01)
SubmitJobs |+++++++++++++++++++++++++++++++++++ | 72% (00:00:00)
SubmitJobs |++++++++++++++++++++++++++++++++++++++ | 78% (00:00:00)
SubmitJobs |+++++++++++++++++++++++++++++++++++++++++ | 83% (00:00:00)
SubmitJobs |++++++++++++++++++++++++++++++++++++++++++++ | 89% (00:00:00)
SubmitJobs |++++++++++++++++++++++++++++++++++++++++++++++ | 94% (00:00:00)
SubmitJobs |+++++++++++++++++++++++++++++++++++++++++++++++++| 100% (00:00:00)
Sending 18 submit messages...
Might take some time, do not interrupt this!
Syncing registry ...
Warning in result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Warning in result_fetch(res@ptr, n = n) :
Don't need to call dbFetch() for statements, only for queries
Warning: executing %dopar% sequentially: no parallel backend registered
iteration 0: criterion value = 1
iteration 1: criterion value = 2.89064205883374e-14
Call:
lm(formula = Postwt ~ Treat + Prewt, data = anorexia)
Residuals:
Min 1Q Median 3Q Max
-14.1083 -4.2773 -0.5484 5.4838 15.2922
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 49.7711 13.3910 3.717 0.00041 ***
TreatCont -4.0971 1.8935 -2.164 0.03400 *
TreatFT 4.5631 2.1333 2.139 0.03604 *
Prewt 0.4345 0.1612 2.695 0.00885 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 6.978 on 68 degrees of freedom
Multiple R-squared: 0.2777, Adjusted R-squared: 0.2458
F-statistic: 8.713 on 3 and 68 DF, p-value: 5.719e-05
[,1]
(Intercept) -2.842171e-14
TreatCont 2.664535e-15
TreatFT -1.776357e-15
Prewt 4.996004e-16
Loading required package: sandwich
(Intercept) TreatCont TreatFT Prewt
(Intercept) -7.759127e-12 -3.379075e-12 -1.425526e-13 7.016610e-14
TreatCont -3.383516e-12 9.592327e-14 -1.643130e-14 4.092213e-14
TreatFT -1.025846e-13 -1.709743e-14 1.776357e-15 1.332268e-15
Prewt 8.304468e-14 4.135234e-14 2.595146e-15 -7.181755e-16
There were 40 warnings (use warnings() to see them)
[[1]]
[,1]
5 79.60546
6 84.03696
7 83.60250
8 78.30208
[[2]]
[,1]
9 80.69161
10 79.73580
11 79.38823
12 84.21075
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.