venus: Do a two-stage multivariate adaptive regression splines...

Description Usage Arguments Details Value Author(s) Examples

View source: R/venus.R

Description

When some parameters are nuisance parameters, it may be desirable to regress them out of the model first. This function does this in a combination of multiple regression and multivariate adaptive regression splines.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
venus(formula, nuisance = NULL, data = NULL,
      method = c("earth", "lm"),
      nuisanceArgs = NULL, mainArgs = NULL)
## S3 method for class 'venus'
plot(x, types = c("nuisance", "main"), ...)
## S3 method for class 'venus'
print(x, types = c("nuisance", "main"), ...)
## S3 method for class 'venus'
summary(object, ...)
## S3 method for class 'venus'
coef(object, type = "main", ...)
## S3 method for class 'venus'
effects(object, type = "main", ...)
## S3 method for class 'venus'
fitted(object, type = "main", ...)
## S3 method for class 'summary.venus'
print(x, types = c("nuisance", "main"), ...)

Arguments

formula

The main model containing the parameters of interest.

nuisance

The nuisance model; predictors here need to be regressed out.

data

A dataframe to fit.

method

A vector with two entries from c("earth", "lm") giving the methods to be used in the two stages.

nuisanceArgs

A list of additional arguments to pass to the nuisance fit.

mainArgs

A list of additional arguments to pass to the main fit.

x, object

Object for S3 methods.

type, types

Which part of the "venus" object should be used? type allows one of c("main", "nuisance", "predictor"), types allows more than one from the same list.

svd.thresh

Threshold on the amount of variance captured in the singular value decomposition of the control variable matrix.

...

Additional parameters to pass to component methods.

Details

We use the earth function from the package of the same name to do the MARS fits. This may be done at either the first or second stage (or both), depending on the choice of values in method; the default is MARS in the first stage, linear fit in the second stage.

The algorithm is as follows:

  1. The nuisance model is fit using method[1] (earth by default). This results in the choice of predictors after adaptation to the data.

  2. The chosen model is used to find residuals in the response and in the predictors in the main model.

  3. The residuals in the response are fit against the residuals in the main predictors using method[2] (lm by default).

  4. Both fitted models are returned.

Value

venus returns a list with S3 class "venus" containing three elements:

nuisanceFit

The first stage fit

mainFit

The second stage fit

predictorFit

The linear fit to partial the nuisance variables out of the predictors in the main fit

The common methods for linear models are defined for these objects; each has a type or types argument to select which of the three components is/are used in the call. For example, coef(venusObject, type = "nuisance") would extract the coefficients from the nuisance fit.

Author(s)

Duncan Murdoch and David Armstrong

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
  if (!require(MASS))
    stop("This example requires MASS")

  sigma2.e.x <- 4.7*3
  ## sigma2.e is the residual variance for y. On average, the systematic
  ## variance in y owing to x and Z is around 7.3.  Initially, I make the
  ## residual variance half of that figure.
  sigma2.e <- 7.3/2
  ## Z are the control variables or nuisance variables
  Z <- mvrnorm(1000, c(0,0,0),
             matrix(c(1,.3,.3,.3,1,.3,.3,.3,1), ncol=3),
             empirical=TRUE)
  ## X is the variable of interest that has a non-linear relationship to
  ## one of the elements in Z (though this isn't strictly speaking
  ## necessary, it does highlight one of the nice features of the
  ## MARS approach, which is that you don't need to know the functional
  ## form of the relationship of controls to variables of interest
  ## (e.g., y and X)
  X <- Z[,1] + Z[,1]^2 + Z[,2] + rnorm(1000, 0, sqrt(sigma2.e.x))
  ## collect results in a data frame
  df <- data.frame(z1 = Z[,1], z2=Z[,2], z3 = Z[,3], x=scale(X))
  ## make y a linear function of x and the z variables,
  ## so the theoretical coefficient on x is 1
  df$y <- with(df, z1 + z2 + z3 + x) + rnorm(1000, 0, sqrt(sigma2.e))

  venus(y ~ x, nuisance = y ~ .-x, data = df)

  # Compare to the pure linear fit

  venus(y ~ x, nuisance = y ~ .-x, method = c("lm", "lm"), data=df)
  lm(y ~ ., data = df)

dmurdoch/venus documentation built on July 15, 2019, 4:30 a.m.