Description Usage Arguments Details Value Author(s) References See Also Examples
fastLm
estimates the linear model using the solve
function of Armadillo
linear algebra library.
1 2 3 4 5 6 7 |
y |
a vector containing the explained variable. |
X |
a model matrix. |
formula |
a symbolic description of the model to be fit. |
data |
an optional data frame containing the variables in the model. |
... |
not used |
Linear models should be estimated using the lm
function. In
some cases, lm.fit
may be appropriate.
The fastLmPure
function provides a reference use case of the Armadillo
library via the wrapper functions in the RcppArmadillo package.
The fastLm
function provides a more standard implementation of
a linear model fit, offering both a default and a formula interface as
well as print
, summary
and predict
methods.
Lastly, one must be be careful in timing comparisons of
lm
and friends versus this approach based on
Armadillo
. The reason that Armadillo
can do something
like lm.fit
faster than the functions in the stats
package is because Armadillo
uses the Lapack version of the QR
decomposition while the stats package uses a modified Linpack
version. Hence Armadillo
uses level-3 BLAS code whereas the
stats package uses level-1 BLAS. However, Armadillo
will
either fail or, worse, produce completely incorrect answers
on rank-deficient model matrices whereas the functions from the stats
package will handle them properly due to the modified Linpack code.
An example of the type of situation requiring extra care in checking for rank deficiency is a two-way layout with missing cells (see the examples section). These cases require a special pivoting scheme of “pivot only on (apparent) rank deficiency” which is not part of conventional linear algebra software.
fastLmPure
returns a list with three components:
coefficients |
a vector of coefficients |
stderr |
a vector of the (estimated) standard errors of the coefficient estimates |
df.residual |
a scalar denoting the degrees of freedom in the model |
fastLm
returns a richer object which also includes the
residuals, fitted values and call argument similar to the lm
or
rlm
functions..
Armadillo is written by Conrad Sanderson. RcppArmadillo is written by Romain Francois, Dirk Eddelbuettel and Douglas Bates.
Armadillo project: http://arma.sourceforge.net/
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | data(trees, package="datasets")
## bare-bones direct interface
flm <- fastLmPure( cbind(1, log(trees$Girth)), log(trees$Volume) )
print(flm)
## standard R interface for formula or data returning object of class fastLm
flmmod <- fastLm( log(Volume) ~ log(Girth), data=trees)
summary(flmmod)
## case where fastLm breaks down
dd <- data.frame(f1 = gl(4, 6, labels = LETTERS[1:4]),
f2 = gl(3, 2, labels = letters[1:3]))[-(7:8), ]
xtabs(~ f2 + f1, dd) # one missing cell
mm <- model.matrix(~ f1 * f2, dd)
kappa(mm) # large, indicating rank deficiency
set.seed(1)
dd$y <- mm %*% seq_len(ncol(mm)) + rnorm(nrow(mm), sd = 0.1)
summary(lm(y ~ f1 * f2, dd)) # detects rank deficiency
summary(fastLm(y ~ f1 * f2, dd)) # some huge coefficients
|
$coefficients
[,1]
[1,] -2.353325
[2,] 2.199970
$stderr
[,1]
[1,] 0.23066284
[2,] 0.08983455
$df.residual
[1] 29
Call:
fastLm.formula(formula = log(Volume) ~ log(Girth), data = trees)
Residuals:
Min. 1st Qu. Median 3rd Qu. Max.
-0.2060000 -0.0687020 0.0010105 0.0725850 0.2479600
Estimate StdErr t.value p.value
(Intercept) -2.353325 0.230663 -10.202 4.18e-11 ***
log(Girth) 2.199970 0.089835 24.489 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.115 on 29 degrees of freedom
Multiple R-squared: 0.9539, Adjusted R-squared: 0.9523
f1
f2 A B C D
a 2 0 2 2
b 2 2 2 2
c 2 2 2 2
[1] 3.295702e+16
Call:
lm(formula = y ~ f1 * f2, data = dd)
Residuals:
Min 1Q Median 3Q Max
-0.12155 -0.04702 0.00000 0.04702 0.12155
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.97786 0.05816 16.81 3.41e-09 ***
f1B 12.03807 0.08226 146.35 < 2e-16 ***
f1C 3.11722 0.08226 37.90 5.22e-13 ***
f1D 4.06852 0.08226 49.46 2.83e-14 ***
f2b 5.06012 0.08226 61.52 2.59e-15 ***
f2c 5.99759 0.08226 72.91 4.01e-16 ***
f1B:f2b -3.01476 0.11633 -25.92 3.27e-11 ***
f1C:f2b 7.70300 0.11633 66.22 1.16e-15 ***
f1D:f2b 8.96425 0.11633 77.06 < 2e-16 ***
f1B:f2c NA NA NA NA
f1C:f2c 10.96133 0.11633 94.23 < 2e-16 ***
f1D:f2c 12.04108 0.11633 103.51 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.08226 on 11 degrees of freedom
Multiple R-squared: 0.9999, Adjusted R-squared: 0.9999
F-statistic: 1.855e+04 on 10 and 11 DF, p-value: < 2.2e-16
Call:
fastLm.formula(formula = y ~ f1 * f2, data = dd)
Residuals:
Min. 1st Qu. Median 3rd Qu. Max.
-0.1867900 -0.0607280 0.0011062 0.0446760 0.1185900
Estimate StdErr t.value p.value
(Intercept) 9.4654e-01 8.5073e-02 1.1126e+01 5.928e-07 ***
f1B 4.0720e+13 5.6715e-02 7.1796e+14 < 2.2e-16 ***
f1C 3.1680e+00 1.2031e-01 2.6331e+01 1.438e-10 ***
f1D 4.0918e+00 1.2031e-01 3.4010e+01 1.143e-11 ***
f2b 5.0977e+00 1.2031e-01 4.2370e+01 1.287e-12 ***
f2c 6.0391e+00 1.2031e-01 5.0195e+01 2.380e-13 ***
f1B:f2b -4.0720e+13 8.9675e-02 -4.5408e+14 < 2.2e-16 ***
f1C:f2b 7.7251e+00 1.7015e-01 4.5403e+01 6.466e-13 ***
f1D:f2b 8.9614e+00 1.7015e-01 5.2669e+01 1.474e-13 ***
f1B:f2c -4.0720e+13 8.9675e-02 -4.5408e+14 < 2.2e-16 ***
f1C:f2c 1.0840e+01 1.7015e-01 6.3712e+01 2.208e-14 ***
f1D:f2c 1.1984e+01 1.7015e-01 7.0433e+01 8.116e-15 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1203 on 10 degrees of freedom
Multiple R-squared: 0.9999, Adjusted R-squared: 0.9998
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.