biglm.big.matrix: Use Thomas Lumley's "biglm" package with a "big.matrix"
In biganalytics: A library of utilities for big.matrix objects of package bigmemory.

Description Usage Arguments Details Value References See Also Examples

View source: R/biglm.big.matrix.R

This is a wrapper to Thomas Lumley's biglm package, allowing it to be used with massive data stored in big.matrix objects.

biglm.big.matrix( formula, data, chunksize=NULL, ..., fc=NULL,
  getNextChunkFunc=NULL)
bigglm.big.matrix( formula, data, chunksize=NULL, ..., fc=NULL,
  getNextChunkFunc=NULL)

`formula`	a model `formula`.
`data`	a `big.matrix`.
`chunksize`	an integer maximum size of chunks of data to process iteratively.
`fc`	either column indices or names of variables that are factors.
`...`	options associated with the `biglm`

or bigglm functions

getNextChunkFunc

a function which retrieves chunk data

See biglm package for more information; chunksize defaults to
max(floor(nrow(data)/ncol(data)^2), 10000).

an object of class biglm.

Algorithm AS274 Applied Statistics (1992) Vol. 41, No.2

Thomas Lumley (2005). biglm: bounded memory linear and generalized linear models. R package version 0.4.

biglm, big.matrix

# This example is quite silly, using the iris
# data.  But it shows that our wrapper to Lumley's biglm() function
# produces the same answer as the plain old lm() function.

## Not run: 
require(bigmemory)
x <- matrix(unlist(iris), ncol=5)
colnames(x) <- names(iris)
x <- as.big.matrix(x)
head(x)

silly.biglm <- biglm.big.matrix(Sepal.Length ~ Sepal.Width + Species,
                                data=x, fc="Species")
summary(silly.biglm)

y <- data.frame(x[,])
y$Species <- as.factor(y$Species)
head(y)

silly.lm <- lm(Sepal.Length ~ Sepal.Width + Species, data=y)
summary(silly.lm)

## End(Not run)

Loading required package: bigmemory
Loading required package: foreach
Loading required package: biglm
Loading required package: DBI
     Sepal.Length Sepal.Width Petal.Length Petal.Width Species
[1,]          5.1         3.5          1.4         0.2       1
[2,]          4.9         3.0          1.4         0.2       1
[3,]          4.7         3.2          1.3         0.2       1
[4,]          4.6         3.1          1.5         0.2       1
[5,]          5.0         3.6          1.4         0.2       1
[6,]          5.4         3.9          1.7         0.4       1
Large data regression model: biglm(formula = formula, data = data, ...)
Sample size =  150 
              Coef   (95%    CI)     SE p
(Intercept) 2.2514 1.5119 2.9909 0.3698 0
Sepal.Width 0.8036 0.5909 1.0162 0.1063 0
Species2    1.4587 1.2345 1.6830 0.1121 0
Species3    1.9468 1.7468 2.1468 0.1000 0
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2       1
2          4.9         3.0          1.4         0.2       1
3          4.7         3.2          1.3         0.2       1
4          4.6         3.1          1.5         0.2       1
5          5.0         3.6          1.4         0.2       1
6          5.4         3.9          1.7         0.4       1

Call:
lm(formula = Sepal.Length ~ Sepal.Width + Species, data = y)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.30711 -0.25713 -0.05325  0.19542  1.41253 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   2.2514     0.3698   6.089 9.57e-09 ***
Sepal.Width   0.8036     0.1063   7.557 4.19e-12 ***
Species2      1.4587     0.1121  13.012  < 2e-16 ***
Species3      1.9468     0.1000  19.465  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.438 on 146 degrees of freedom
Multiple R-squared:  0.7259,	Adjusted R-squared:  0.7203 
F-statistic: 128.9 on 3 and 146 DF,  p-value: < 2.2e-16