groupedSurv: Compute the Efficient Score Statistic for the Grouped...

View source: R/groupedSurv.R

groupedSurvR Documentation

Compute the Efficient Score Statistic for the Grouped Survival Model

Description

A method to compute the efficient score statistic for a grouped survival model. The function supports zero, single, or multiple covariates as dictated by the input data. The function can take either a gwaa.data object or a standard data.frame or matrix as input. If more than one variable of interest is provided, the efficient score will be computed for each one separately. For each variable tested, the function returns the unadjusted marginal asymptotic p-value, as well as the family-wise error rate (FWER, Bonferroni correction) adjusted p-value and local false-discovery rates (FDR, Storey's Q-values).

Usage

  groupedSurv(x, Z=NULL, GenABEL.data=NULL, alpha, theta=NULL, 
    gtime, delta, beta=0, nCores=1, reScore=FALSE)

Arguments

x

Vector, data.frame, or matrix of numeric variables of interest for each sample. Alternatively, if a gwaa.data object is given for GenABEL.data, this argument may be a vector of strings corresponding to column names in GenABEL.data@gtdata to use as variables of interest, set NULL will use all the variables in GenABEL.data@gtdata as variables of interest.

Z

Optional data.frame or matrix of numeric covariate values for each sample. Alternatively, if a gwaa.data object is given for GenABEL.data, this argument may be a vector of strings corresponding to column names in GenABEL.data@phdata to use as covariates.

GenABEL.data

Optional gwaa.data object.

alpha

Vector of baseline survival rates for each time interval.

theta

Optional vector of estimated nuisance parameters for the covariates.

gtime

Vector of observed survival times for each sample.

delta

Vector of event indicators for each sample: 1 indicates observed event, 0 indicates censored.

beta

Scalar for the parameter of interest. Default is 0.

nCores

Integer representing the number of cores to be used for multi-threaded computation. Default is 1.

reScore

Boolean indicating whether to return the full efficient scores matrix. Default is FALSE

Value

If reScore=FALSE, the function returns a data.frame with one row for each variables of interest and four columns: stat, the efficient score statistic, pvalue, unadjusted p-value, FWER, the family-wise error rate adjusted p-value (Bonferroni correction), and FDR, the local false-discovery rate (Storey's Q-value). If reScore=TRUE, a list is returned. The first element is the data.frame of efficient score statistics and p-values, and the second is a matrix of contributions of each sample to the efficient score statistics of each variable of interest. Rows correspond to samples, and columns correspond to the variables of interest.

References

Bonferroni, C.E. (1935). Il calcolo delle assicurazioni su gruppi di teste. Studi in Onore del Professore Salvatore Ortu Carbon, 13-60.
Storey, J.D. (2003). The positive false discovery rate: a Bayesian interpretation and the q-value. Annals of Statistics, 31:6, 2013-2035.
Storey, J.D., Taylor, J.E., and Siegmund, D. (2004). Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach. Journal of the Royal Statistical Society Series B-Statistical Methodology, 66, 187-205

Examples

# Generate dummy data
set.seed(111)
n <- 1000

# effect size
beta <- 0.4

# covariate parameters
theta <- c(0.2, 0.2)

# variable of interest associated with outcome
MAF <- 0.05
x <- matrix(rbinom(n, 2, MAF), ncol = 1)

# covariate data (centered at 0)
z1 <- rnorm(n)
z2 <- rbinom(n, 1, 0.5) - 0.5
Z  <- matrix(cbind(z1, z2), ncol = 2)

# continuous survival time
lam0 <- 1
cmax <- 3
lami <- lam0 * exp(x*beta + Z[,1]*theta[1]+Z[,2]*theta[2])
stime <- rexp(n, lami)
ctime <- runif(n, 0, cmax)
delta <- stime < ctime
otime <- pmin(stime, ctime)

# additional variables of interest
xMore <- matrix(rbinom(n*100, 2, MAF), ncol = 100)
xMore <- cbind(x, xMore)

# number of observation times
ntps <- 5

# number of intervals
r <- ntps + 1

# last observation time
maxbreakq <- 0.85
maxbreak <- qexp(maxbreakq, lam0)

# grouped failure times
breaks <- (1:ntps) * (maxbreak/ntps)
gtime <- findInterval(otime, breaks) + 1
delta[gtime == r] <- FALSE
dctime <- findInterval(ctime, breaks) + 1
delta[gtime == dctime] <- FALSE
delta <- as.numeric(delta)
gtime[which(gtime == r)] <- Inf

# estimate nuisance parameters
#thetaest <- thetaEst(Z, gtime, delta)

# compute efficient score statistics
# eff <- groupedSurv(x=xMore, Z=Z, alpha=thetaest$alpha, theta=thetaest$theta,
#		gtime=gtime, delta=delta, beta=0, nCores=1)
# head(eff)

groupedSurv documentation built on Sept. 29, 2023, 1:06 a.m.