biglm.big.matrix: Use Thomas Lumley's "biglm" package with a "big.matrix"

Description Usage Arguments Value Examples

View source: R/biglm.big.matrix.R

Description

This is a wrapper to Thomas Lumley's biglm package, allowing it to be used with massive data stored in big.matrix objects.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
bigglm.big.matrix(
  formula,
  data,
  chunksize = NULL,
  ...,
  fc = NULL,
  getNextChunkFunc = NULL
)

biglm.big.matrix(
  formula,
  data,
  chunksize = NULL,
  ...,
  fc = NULL,
  getNextChunkFunc = NULL
)

Arguments

formula

a model formula.

data

a big.matrix.

chunksize

an integer maximum size of chunks of data to process iteratively.

fc

either column indices or names of variables that are factors.

...

options associated with the biglm

getNextChunkFunc

a function which retrieves chunk data

Value

an object of class biglm

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
## Not run: 
library(bigmemory)
x <- matrix(unlist(iris), ncol=5)
colnames(x) <- names(iris)
x <- as.big.matrix(x)
head(x)

silly.biglm <- biglm.big.matrix(Sepal.Length ~ Sepal.Width + Species,
                                data=x, fc="Species")
summary(silly.biglm)

y <- data.frame(x[,])
y$Species <- as.factor(y$Species)
head(y)

silly.lm <- lm(Sepal.Length ~ Sepal.Width + Species, data=y)
summary(silly.lm)

## End(Not run)

Example output

Loading required package: bigmemory
Loading required package: foreach
Loading required package: biglm
Loading required package: DBI
     Sepal.Length Sepal.Width Petal.Length Petal.Width Species
[1,]          5.1         3.5          1.4         0.2       1
[2,]          4.9         3.0          1.4         0.2       1
[3,]          4.7         3.2          1.3         0.2       1
[4,]          4.6         3.1          1.5         0.2       1
[5,]          5.0         3.6          1.4         0.2       1
[6,]          5.4         3.9          1.7         0.4       1
Large data regression model: biglm(formula = formula, data = data, ...)
Sample size =  150 
              Coef   (95%    CI)     SE p
(Intercept) 2.2514 1.5119 2.9909 0.3698 0
Sepal.Width 0.8036 0.5909 1.0162 0.1063 0
Species2    1.4587 1.2345 1.6830 0.1121 0
Species3    1.9468 1.7468 2.1468 0.1000 0
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2       1
2          4.9         3.0          1.4         0.2       1
3          4.7         3.2          1.3         0.2       1
4          4.6         3.1          1.5         0.2       1
5          5.0         3.6          1.4         0.2       1
6          5.4         3.9          1.7         0.4       1

Call:
lm(formula = Sepal.Length ~ Sepal.Width + Species, data = y)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.30711 -0.25713 -0.05325  0.19542  1.41253 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   2.2514     0.3698   6.089 9.57e-09 ***
Sepal.Width   0.8036     0.1063   7.557 4.19e-12 ***
Species2      1.4587     0.1121  13.012  < 2e-16 ***
Species3      1.9468     0.1000  19.465  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.438 on 146 degrees of freedom
Multiple R-squared:  0.7259,	Adjusted R-squared:  0.7203 
F-statistic: 128.9 on 3 and 146 DF,  p-value: < 2.2e-16

biganalytics documentation built on July 8, 2020, 5:07 p.m.