Description Usage Arguments Value Details References Examples
View source: R/mrfrequentist.R
mrfrequentist is used to conduct frequentist linear
regression on very large data sets using Merge and Reduce as
described in Geppert et al. (2020).
1 2 3 4 5 6 7 8 9 10 11 12 13 |
formula |
|
fileMr |
( |
dataMr |
( |
obsPerBlock |
|
approach |
|
sep |
See documentation of |
dec |
See documentation of |
header |
|
naStrings |
|
colNames |
|
naAction |
|
Returns an object of class "mrfrequentist" which is a list
containing the following components for both approaches "1" and "3":
approach |
The approach used for merging the models. Either "1" or "3". |
formula |
The model's |
level |
Number of level of the final model in Merge and Reduce. This is equal to log2(ceiling(numberObs/obsPerBlock))+1 and corresponds to the number of buckets in Figure 1 of Geppert et al. (2020). |
numberObs |
The total number of observations. |
summaryStats |
Summary statistics reporting the estimated regression coefficients
and their unbiased standard errors. Estimates are based
on the merge technique as specified in the argument |
dataHead |
First six rows of the data in the first block. This serves
as a sanity check, especially when using the argument |
terms |
Terms object. |
Additionally for approach "3" only:
XTX |
The final model's |
yTX |
The final model's |
yTy |
The final model's |
In approach "3" the estimated regression coefficients and their unbiased standard errors
are calculated via qr decompositions on X'X (as in speedlm
with argument method = "qr"). Moreover, the merge step uses the same
idea of blockwise addition for X'X, y'y and y'X as speedglm's updating
procedure updateWithMoreData. Conceptually though,
Merge and Reduce is not an updating algorithm as it merges models based on
a comparable amount of data along a tree structure to obtain a final model.
Geppert, L.N., Ickstadt, K., Munteanu, A., & Sohler, C. (2020).
Streaming statistical models via Merge & Reduce. International Journal
of Data Science and Analytics, 1-17,
doi: https://doi.org/10.1007/s41060-020-00226-0
1 2 3 4 5 6 7 8 9 | ## run mrfrequentist() with dataMr
data(exampleData)
fit1 = mrfrequentist(dataMr = exampleData, approach = "1", obsPerBlock = 300,
formula = V11 ~ .)
## run mrfrequentist() with fileMr
filepath = system.file("extdata", "exampleFile.txt", package = "mrregression")
fit2 = mrfrequentist(fileMr = filepath, approach = "3", header = TRUE,
obsPerBlock = 100, formula = y ~ .)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.