biglm.big.matrix: Use Thomas Lumley's "biglm" package with a "big.matrix"

Description Usage Arguments Details Value References See Also Examples

View source: R/biglm.big.matrix.R

Description

This is a wrapper to Thomas Lumley's biglm package, allowing it to be used with massive data stored in big.matrix objects.

Usage

1
2
3
4
biglm.big.matrix( formula, data, chunksize=NULL, ..., fc=NULL,
  getNextChunkFunc=NULL)
bigglm.big.matrix( formula, data, chunksize=NULL, ..., fc=NULL,
  getNextChunkFunc=NULL)

Arguments

formula

a model formula.

data

a big.matrix.

chunksize

an integer maximum size of chunks of data to process iteratively.

fc

either column indices or names of variables that are factors.

...

options associated with the biglm

or bigglm functions

getNextChunkFunc

a function which retrieves chunk data

Details

See biglm package for more information; chunksize defaults to
max(floor(nrow(data)/ncol(data)^2), 10000).

Value

an object of class biglm.

References

Algorithm AS274 Applied Statistics (1992) Vol. 41, No.2

Thomas Lumley (2005). biglm: bounded memory linear and generalized linear models. R package version 0.4.

See Also

biglm, big.matrix

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# This example is quite silly, using the iris
# data.  But it shows that our wrapper to Lumley's biglm() function
# produces the same answer as the plain old lm() function.

## Not run: 
require(bigmemory)
x <- matrix(unlist(iris), ncol=5)
colnames(x) <- names(iris)
x <- as.big.matrix(x)
head(x)

silly.biglm <- biglm.big.matrix(Sepal.Length ~ Sepal.Width + Species,
                                data=x, fc="Species")
summary(silly.biglm)

y <- data.frame(x[,])
y$Species <- as.factor(y$Species)
head(y)

silly.lm <- lm(Sepal.Length ~ Sepal.Width + Species, data=y)
summary(silly.lm)

## End(Not run)

Example output

Loading required package: bigmemory
Loading required package: foreach
Loading required package: biglm
Loading required package: DBI
     Sepal.Length Sepal.Width Petal.Length Petal.Width Species
[1,]          5.1         3.5          1.4         0.2       1
[2,]          4.9         3.0          1.4         0.2       1
[3,]          4.7         3.2          1.3         0.2       1
[4,]          4.6         3.1          1.5         0.2       1
[5,]          5.0         3.6          1.4         0.2       1
[6,]          5.4         3.9          1.7         0.4       1
Large data regression model: biglm(formula = formula, data = data, ...)
Sample size =  150 
              Coef   (95%    CI)     SE p
(Intercept) 2.2514 1.5119 2.9909 0.3698 0
Sepal.Width 0.8036 0.5909 1.0162 0.1063 0
Species2    1.4587 1.2345 1.6830 0.1121 0
Species3    1.9468 1.7468 2.1468 0.1000 0
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2       1
2          4.9         3.0          1.4         0.2       1
3          4.7         3.2          1.3         0.2       1
4          4.6         3.1          1.5         0.2       1
5          5.0         3.6          1.4         0.2       1
6          5.4         3.9          1.7         0.4       1

Call:
lm(formula = Sepal.Length ~ Sepal.Width + Species, data = y)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.30711 -0.25713 -0.05325  0.19542  1.41253 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   2.2514     0.3698   6.089 9.57e-09 ***
Sepal.Width   0.8036     0.1063   7.557 4.19e-12 ***
Species2      1.4587     0.1121  13.012  < 2e-16 ***
Species3      1.9468     0.1000  19.465  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.438 on 146 degrees of freedom
Multiple R-squared:  0.7259,	Adjusted R-squared:  0.7203 
F-statistic: 128.9 on 3 and 146 DF,  p-value: < 2.2e-16

biganalytics documentation built on May 2, 2019, 4:45 p.m.