compute_r2: Compute R squared

Description Usage Arguments Details Value References See Also Examples

Description

Compute the R squared value for a given cluster or group of variables.

Usage

1
2
compute_r2(x, y, res.test.hierarchy, clvar = NULL,
  family = c("gaussian", "binomial"), colnames.cluster = NULL)

Arguments

x

a matrix or list of matrices for multiple data sets. The matrix or matrices have to be of type numeric and are required to have column names / variable names. The rows and the columns represent the observations and the variables, respectively.

y

a vector, a matrix with one column, or list of the aforementioned objects for multiple data sets. The vector, vectors, matrix, or matrices have to be of type numeric.

res.test.hierarchy

the output of one of the functions test_hierarchy, test_only_hierarchy, or multisplit.

clvar

a matrix or list of matrices of control variables.

family

a character string naming a family of the error distribution; either "gaussian" or "binomial".

colnames.cluster

The column names / variables names of the cluster of interest. If not supplied, the R squared value of the full model is computed.

Details

The R squared value is computed based on the output of the multi-sample splitting step. For each split, the intersection of the cluster / group (specified in colnames.cluster) and the selected variables is taken and R squared values are computed based on the second halves of observations. Finally, the R squared values are averaged over the B splits and over the different data sets if multiple data sets are supplied.

For a continuous response, the adjusted R squared values is calculated for a given cluster or group of variables. The Nagelkerke’s R squared values is computed for a binary response using the function NagelkerkeR2.

If colnames.cluster is not supplied, the R squared value of the full model is computed.

Value

The returned value is the R squared value.

References

Renaux, C. et al. (2018), Hierarchical inference for genome-wide association studies: a view on methodology with software. (arXiv:1805.02988)

Nagelkerke, N. J. et al. (1991). A note on a general definition of the coefficient of determination. Biometrika, 78:691–692.

See Also

test_hierarchy.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
n <- 200
p <- 500
library(MASS)
set.seed(3)
x <- mvrnorm(n, mu = rep(0, p), Sigma = diag(p))
colnames(x) <- paste0("Var", 1:p)
beta <- rep(0, p)
beta[c(5, 20, 46)] <- 1
y <- x %*% beta + rnorm(n)

dendr <- cluster_var(x = x)
set.seed(47)
sign.clusters <- test_hierarchy(x = x, y = y, dendr = dendr,
                                family = "gaussian")

compute_r2(x = x, y = y, res.test.hierarchy = sign.clusters,
           family = "gaussian",
           colnames.cluster = c("Var1", "Var5", "Var8"))

crbasel/hierinf documentation built on May 24, 2019, 7:14 a.m.