Description Usage Arguments Value Details References Examples
View source: R/mrfrequentist.R
mrfrequentist
is used to conduct frequentist linear
regression on very large data sets using Merge and Reduce as
described in Geppert et al. (2020).
1 2 3 4 5 6 7 8 9 10 11 12 13 
formula 

fileMr 
( 
dataMr 
( 
obsPerBlock 

approach 

sep 
See documentation of 
dec 
See documentation of 
header 

naStrings 

colNames 

naAction 

Returns an object of class "mrfrequentist"
which is a list
containing the following components for both approaches "1" and "3":
approach 
The approach used for merging the models. Either "1" or "3". 
formula 
The model's 
level 
Number of level of the final model in Merge and Reduce. This is equal to log2(ceiling(numberObs/obsPerBlock))+1 and corresponds to the number of buckets in Figure 1 of Geppert et al. (2020). 
numberObs 
The total number of observations. 
summaryStats 
Summary statistics reporting the estimated regression coefficients
and their unbiased standard errors. Estimates are based
on the merge technique as specified in the argument 
dataHead 
First six rows of the data in the first block. This serves
as a sanity check, especially when using the argument 
terms 
Terms object. 
Additionally for approach "3" only:
XTX 
The final model's 
yTX 
The final model's 
yTy 
The final model's 
In approach "3" the estimated regression coefficients and their unbiased standard errors
are calculated via qr decompositions on X'X (as in speedlm
with argument method = "qr"
). Moreover, the merge step uses the same
idea of blockwise addition for X'X, y'y and y'X as speedglm
's updating
procedure updateWithMoreData
. Conceptually though,
Merge and Reduce is not an updating algorithm as it merges models based on
a comparable amount of data along a tree structure to obtain a final model.
Geppert, L.N., Ickstadt, K., Munteanu, A., & Sohler, C. (2020).
Streaming statistical models via Merge & Reduce. International Journal
of Data Science and Analytics, 117,
doi: https://doi.org/10.1007/s41060020002260
1 2 3 4 5 6 7 8 9  ## run mrfrequentist() with dataMr
data(exampleData)
fit1 = mrfrequentist(dataMr = exampleData, approach = "1", obsPerBlock = 300,
formula = V11 ~ .)
## run mrfrequentist() with fileMr
filepath = system.file("extdata", "exampleFile.txt", package = "mrregression")
fit2 = mrfrequentist(fileMr = filepath, approach = "3", header = TRUE,
obsPerBlock = 100, formula = y ~ .)

