Linear models (lm) are one of the most basic statistical models available. The simplest regression model is $$ Y_i = \beta_0 + \beta_1 x_i + \epsilon_i $$ where $\epsilon_i \sim N(0, \sigma^2)$. This corresponds to fitting a straight line through some points. So $\beta_0$ is the $y$-intercept and $\beta_1$ is the gradient. The aim is to estimate $\beta_0$ and $\beta_1$.
ex = -1:10 ey = 2 * ex + 3 set.seed(1) #setnicepar();mypalette(1) par(mar = c(3, 3, 2, 1), mgp = c(2, 0.4, 0), tck = -.01, cex.axis = 0.9, las = 1) plot(ex, ey, type = "l", xlab = "", ylab = "", ylim = range(0, max(ey)), xlim = range(-1, max(ex)), axes = FALSE) abline(v = 0); abline(h = 0) trix = ex[7:10] triy = ey[7:10] lines(trix, rep(triy[1], length(trix)), lty = 2, col = 1) lines(rep(trix[length(trix)], length(triy)), triy, lty = 2, col = 1) text(0.4, 2, labels = expression(beta[0]), col = 1) text(ex[10] + 0.5, ey[7] + 1, labels = expression(beta[1]), col = 1) points(1:10, ey[-1:-2] + rnorm(10, sd = 1.5), pch = 21, bg = "steelblue")
In the more general multiple regression model, there are $p$ predictor variables $$ Y_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + \cdots + \beta_p x_{ip} + \epsilon_i, $$ where $x_{ij}$ is the $i^\text{th}$ observation on the $j^\text{th}$ independent variable. The above equation can be written neatly in matrix notation as $$ \mathbf{Y} = X \mathbf{\beta} + \mathbf{\epsilon} $$ with dimensions [ [n\times 1]= [n\times (p+1)] ~[(p+1)\times 1] + [n \times 1 ]\;, ] where
The goal of regression is to estimate $\mathbf{\beta}$ with $\mathbf{\hat\beta}$. It can be shown that [ \mathbf{\hat\beta} = (X^T X)^{-1} X^T \mathbf{Y} \;. ] Our estimate of $\mathbf{\hat \beta}$ will exist provided that $(X^T X)^{-1}$ exists, i.e. no column of $X$ is a linear combination of other columns. For a least squares regression with a sample size of $n$ training examples and $p$ predictors, it takes:
Since $n >> p$, this means that the algorithm scales with order
$O(p^2 n)$. As well as taking a long time to calculate, the memory required also
increases. The R implementation of \cc{lm} requires $O(np + p^2)$ in memory. But
this can be reduced by constructing the model matrix in chunks. The \cc{biglm}'s
algorithm is based on algorithm AS
274. It
works by updating the Cholesky decomposition with new observations. So for a
model with $p$ variables, only the $p \times p$ (triangular) Cholesky factor and
a single row of data needs to be in the memory at any given time. The biglm
package does not do the chunking for you, but ffbase
provides a handy S3
wrapper, bigglm.ffdf
.
For an example of using \cc{biglm}, see the blog post at by Bnosac.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.