makeDesignMatrix: Make a Design Matrix

View source: R/makeDesignMatrix.R

makeDesignMatrixR Documentation

Make a Design Matrix

Description

Create a sparse model matrix which respects the basis functions of the original data on which it was created. Primarily for internal use but may be of some independent interest.

Usage

makeDesignMatrix(formula, origData, newData, test = TRUE, sparse = TRUE)

Arguments

formula

the formula describing the design matrix. Any responses will be deleted

origData

the original dataset as a dataframe

newData

a dataframe containing any of the variables in the formula. This will provide the data in the returned model matrix.

test

when set to TRUE runs a test that the matrix was constructed correctly see details for more.

sparse

by default returns a sparse matrix using sparse.model.matrix in Matrix

Details

This functions is designed to be used in settings where we need to make a prediction using a model matrix. The practical challenge here is ensuring that the representation of the data lines up with the original representation. This becomes challenging for functions that produce a different representation depending on their inputs. A simple conceptual example is factor variables. If we run our original model using a factor with levels c("A","B", "C") then when we try to make predictions for data having only levels c("A","C") we need to adjust for the missing level. Base R functions like predict.lm in stats handle this gracefully and this function is essentially a version of predict.lm that only constructs the model matrix.

Beyond factors the key use case for this are basis functions like splines. For a function like this to work it must either depend only on the observation it is transforming (e.g. log) or it must have a generic for predict and makepredictcall. The spline wrapper s has both and so should work.

When a function lacks these methods it will still produce a design matrix but the values will be wrong. To catch these settings we implement a quick test when test=TRUE as it is by default. To test we simply split the original data in half and ensure that looking at each half separately produces the same values as the complete original data.

See Also

fitNewDocuments

Examples

foo <- data.frame(response=rnorm(30),
                  predictor=as.factor(rep(c("A","B","C"),10)), 
                  predictor2=rnorm(30))
foo.new <- data.frame(predictor=as.factor(c("A","C","C")), 
                      predictor2=foo$predictor2[1:3])
makeDesignMatrix(~predictor + s(predictor2), foo, foo.new)

stm documentation built on June 24, 2024, 5:18 p.m.