bigRR: Fitting big ridge regression

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/bigRR.R

Description

Function fits big ridge regression with special computational advantage for the cases when number of shrinkage parameters exceeds number of observations. The shrinkage parameter, lambda, can be pre-specified or estimated along with the model. Any subset of model parameter can be shrunk.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
bigRR(formula = NULL, y = NULL, X = NULL, Z = NULL, data = NULL, 
      shrink = NULL, weight = NULL, family = gaussian(link = identity), 
      lambda = NULL, impute = FALSE, tol.err = 1e-6, tol.conv = 1e-8, 
      only.estimates = FALSE, GPU = FALSE, ...)
## Default S3 method:
bigRR(formula = NULL, y , X , Z , data = NULL, 
      shrink = NULL, weight = NULL, family = gaussian(link = identity), 
      lambda = NULL, impute = FALSE, tol.err = 1e-6, tol.conv = 1e-8, 
      only.estimates = FALSE, GPU = FALSE, ...)
## S3 method for class 'formula'
bigRR(formula = NULL, y = NULL, X = NULL, Z = NULL, data = NULL, 
      shrink = NULL, weight = NULL, family = gaussian(link = identity), 
      lambda = NULL, impute = FALSE, tol.err = 1e-6, tol.conv = 1e-8, 
      only.estimates = FALSE, GPU = FALSE, ...)

Arguments

formula

a two-sided model formula. However, matrix input is recommended since it makes the inputs clear, i.e. to define y, X and Z.

y

response variable; either y or formula is to be used.

X

design matrix related to the parameters not to be shrunk (i.e. fixed effects in the mixed model framework); not required if formula is already used.

Z

design matrix associated with shrinkage parameters (i.e. random effects in the mixed model framework); not required if model formula is used.

data

the data frame, usually useful when the input is in formula.

shrink

either a numeric or a character vector specifying the position or the names if the variables whose coefficients are to be shrunk.

weight

a vector of prior weights for each of the shrinkage parameters.

family

the distribution family of y, see help('family') for more details.

lambda

the shrinkage parameter determines the amount of shrinkage. Default is NULL meaning that it is to be estimated along with other model parameters.

impute

logical; specify whether missing values (genotypes) should be imputed (see Details).

tol.err

internal tolerance level for extremely small values; default value is 1e-6.

tol.conv

tolerance level in convergence; default value is 1e-8.

only.estimates

logical; TRUE if hat values are to be returned. Default is FALSE and the hat values are not returned.

GPU

logical; specify whether GPU should be used in computation. Note that the package gputools is required in this case, and the computer's graphic card needs to be CUDA-enabled. Check e.g. NVIDIA website for more information.

...

unused arguments

Details

The function fits ridge regression (Shen et al. 2013) using the random effects model algorithm presented in Ronnegard et al. (2010). The computational intensity of the estimation depends on the number of parameters but not the number of observations.

The model can be specified by using formula or by design matrices. If both are used then the formula interface will be used. The shrink argument specifies the subset of parameters to be estimated. If the model is specified by using formula and a dot (.) is used in the right hand side of the formula then shrink indicates the variables in the data frame. Otherwise it represents the respective variable in the model formula. It is ignored it model is specified by using design matrices.

When impute = TRUE, an easy (naive) way is used to impute the missing values in the Z matrix, i.e. missing values in each column is filled in by sampling from the distribution determined by the non-missing values. Note that observations with missing values in the response vector y are simply removed from the analysis.

Value

Returns a list of object class bigRR containing the following values: (see Examples for how to use the estimated parameters for a prediction purpose.)

phi

estimated residual variance (Non-genetic variance component).

lambda

estimated random effect variance (Genetic variance component). which is proportional to the usual lambda parameter of the ridge regression representing the amount of shrinkage.

beta

fixed effects estimates - subset of model parameters which is/are not shrunk, i.e. those associated with the X matrix.

u

random effects estimates (genetic effects of each marker) - subset of model parameters which are shrunk, i.e. those associated with the Z matrix.

leverage

hat values for the random effects.

hglm

the internal fitted hglm object for the linear mixed model.

Call

how the bigRR was called.

Author(s)

Xia Shen, Moudud Alam, Lars Ronnegard

References

Shen X, Alam M, Fikse F and Ronnegard L (2013). A novel generalized ridge regression method for quantitative genetics. Genetics, 193, 1255-1268.

Ronnegard L, Shen X and Alam M (2010): hglm: A Package for Fitting Hierarchical Generalized Linear Models. The R Journal, 2(2), 20-28.

See Also

lm.ridge in MASS library.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# --------------------------------------------- #  
#              Arabidopsis example              #
# --------------------------------------------- #  
## Not run: 
require(bigRR)
data(Arabidopsis)
X <- matrix(1, length(y), 1)

# fitting SNP-BLUP, i.e. a ridge regression on all the markers across the genome
#
SNP.BLUP.result <- bigRR(y = y, X = X, Z = scale(Z), 
                         family = binomial(link = 'logit'))

# fitting HEM, i.e. a generalized ridge regression with marker-specific shrinkage
#
HEM.result <- bigRR_update(SNP.BLUP.result, scale(Z), 
                           family = binomial(link = 'logit'))

# plot and compare the estimated effects from both methods
#
split.screen(c(1, 2))
split.screen(c(2, 1), screen = 1)
screen(3); plot(abs(SNP.BLUP.result$u), cex = .6, col = 'slateblue')
screen(4); plot(abs(HEM.result$u), cex = .6, col = 'olivedrab')
screen(2); plot(abs(SNP.BLUP.result$u), abs(HEM.result$u), cex = .6, pch = 19, 
                col = 'darkmagenta')

# create a random new genotypes for 10 individuals with the same number of markers 
# and predict the outcome using the fitted HEM
#
Z.new <- matrix(sample(c(-1, 1), 10*ncol(Z), TRUE), 10)
y.predict <- as.numeric(HEM.result$beta + Z.new %*% HEM.result$u)
#
# NOTE: The above prediction may not be good due to the scaling in the HEM 
#       fitting above, and alternatively, one can either remove the scaling 
#       above or scale Z.new by row-binding it with the original Z matrix.

## End(Not run)

bigRR documentation built on July 25, 2020, 3 a.m.