tcgsa_seq: Time-course Gene Set Analysis

Description Usage Arguments Value References See Also Examples

View source: R/tcgsa_seq.R

Description

Wrapper function for performing gene set analysis of (potentially longitudinal) RNA-seq data

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
tcgsa_seq(
  y,
  x,
  phi,
  weights_phi_condi = TRUE,
  genesets,
  indiv = NULL,
  Sigma_xi = diag(ncol(phi)),
  which_test = c("permutation", "asymptotic"),
  which_weights = c("loclin", "voom", "none"),
  n_perm = 1000,
  progressbar = TRUE,
  parallel_comp = TRUE,
  nb_cores = parallel::detectCores() - 1,
  preprocessed = FALSE,
  doPlot = TRUE,
  gene_based_weights = TRUE,
  bw = "nrd",
  kernel = c("gaussian", "epanechnikov", "rectangular", "triangular", "biweight",
    "tricube", "cosine", "optcosine"),
  exact = FALSE,
  transform = TRUE,
  padjust_methods = c("BH", "BY", "holm", "hochberg", "hommel", "bonferroni"),
  lowess_span = 0.5,
  R = NULL,
  homogen_traj = FALSE,
  na.rm_tcgsaseq = TRUE,
  verbose = TRUE
)

Arguments

y

a numeric matrix of size G x n containing the raw RNA-seq counts or preprocessed expressions from n samples for G genes.

x

a numeric matrix of size n x p containing the model covariates from n samples (design matrix). Usually, its first column is the intercept (full of 1s).

phi

a numeric design matrix of size n x K containing the K variables to be tested

weights_phi_condi

a logical flag indicating whether heteroscedasticity weights computation should be conditional on both the variable(s) to be tested phi and on covariate(s) x, or on x alone. #'Default is TRUE in which case conditional means are estimated conditionally on both x and phi.

genesets

either a vector of index or subscripts that defines which rows of y constitute the investigated gene set (when only 1 gene set is being tested). Can also be a list of index (or rownames of y) when several gene sets are tested at once, such as the first element of a gmt object. If NULL, then gene-wise p-values are returned.

indiv

a vector of length n containing the information for attributing each sample to one of the studied individuals. Coerced to be a factor. Default is NULL in which case each sample is considered as coming from independent subjects.

Sigma_xi

a matrix of size K x K containing the covariance matrix of the K random effects. Only used if homogen_traj is FALSE. Default assume diagonal correlation matrix, i.e. independence of random effects.

which_test

a character string indicating which method to use to approximate the variance component score test, either "permutation" or "asymptotic". Default is "permutation".

which_weights

a character string indicating which method to use to estimate the mean-variance relationship weights. Possibilities are "loclin", "voom" or "none" (in which case no weighting is performed). Default is "loclin". See sp_weights and voom_weights for details.

n_perm

the number of perturbations. Default is 1000.

progressbar

logical indicating whether a progress bar should be displayed when computing permutations (only in interactive mode).

parallel_comp

a logical flag indicating whether parallel computation should be enabled. Only Linux and MacOS are supported, this is ignored on Windows. Default is TRUE.

nb_cores

an integer indicating the number of cores to be used when parallel_comp is TRUE. Only Linux and MacOS are supported, this is ignored on Windows. Default is parallel::detectCores() - 1.

preprocessed

a logical flag indicating whether the expression data have already been preprocessed (e.g. log2 transformed). Default is FALSE, in which case y is assumed to contain raw counts and is normalized into log(counts) per million.

doPlot

a logical flag indicating whether the mean-variance plot should be drawn. Default is FALSE.

gene_based_weights

a logical flag used for "loclin" weights, indicating whether to estimate weights at the gene-level, or rather at the observation-level. Default is TRUE, and weights are then estimated at the gene-level.

bw

a character string indicating the smoothing bandwidth selection method to use. See bandwidth for details. Possible values are "ucv", "SJ", "bcv", "nrd" or "nrd0"

kernel

a character string indicating which kernel should be used. Possibilities are "gaussian", "epanechnikov", "rectangular", "triangular", "biweight", "tricube", "cosine", "optcosine". Default is "gaussian" (NB: "tricube" kernel corresponds to the loess method).

exact

a logical flag indicating whether the non-parametric weights accounting for the mean-variance relationship should be computed exactly or extrapolated from the interpolation of local regression of the mean against the variance. Default is FALSE, which uses interpolation (faster computation).

transform

a logical flag used for "loclin" weights, indicating whether values should be transformed to uniform for the purpose of local linear smoothing. This may be helpful if tail observations are sparse and the specified bandwidth gives suboptimal performance there. Default is TRUE.

padjust_methods

multiple testing correction method used if genesets is a list. Default is "BH", i.e. Benjamini-Hochberg procedure for controlling the FDR. Other possibilities are: "holm", "hochberg", "hommel", "bonferroni" or "BY" (for Benjamini-Yekutieli procedure).

lowess_span

smoother span for the lowess function, between 0 and 1. This gives the proportion of points in the plot which influence the smooth at each value. Larger values give more smoothness. Only used if which_weights is "voom". Default is 0.5.

R

library.size (optional, important to provide if preprocessed = TRUE). Default is NULL

homogen_traj

a logical flag indicating whether trajectories should be considered homogeneous. Default is FALSE in which case trajectories are not only tested for trend, but also for heterogeneity.

na.rm_tcgsaseq

logical: should missing values in y (including NA and NaN) be omitted from the calculations? Default is TRUE.

verbose

logical: should informative messages be printed during the computation? Default is TRUE.

Value

A list with the following elements:

References

Agniel D & Hejblum BP (2017). Variance component score test for time-course gene set analysis of longitudinal RNA-seq data, Biostatistics, 18(4):589-604. 10.1093/biostatistics/kxx005. arXiv:1605.02351.

Law, C. W., Chen, Y., Shi, W., & Smyth, G. K. (2014). voom: Precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biology, 15(2), R29.

See Also

sp_weights vc_test_perm vc_test_asym p.adjust

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
#rm(list=ls())
nsims <- 2 #100
res_quant <- list()
for(i in 1:2){
n <- 2000#0
nr <- 3
r <- nr*20#4*nr#100*nr
t <- matrix(rep(1:nr), r/nr, ncol=1, nrow=r)
sigma <- 0.4
b0 <- 1

#under the null:
b1 <- 0

y.tilde <- b0 + b1*t + rnorm(r, sd = sigma)
y <- t(matrix(rnorm(n*r, sd = sqrt(sigma*abs(y.tilde))), ncol=n, nrow=r) +
      matrix(rep(y.tilde, n), ncol=n, nrow=r))
x <- matrix(1, ncol=1, nrow=r)

#run test
res <- tcgsa_seq(y, x, phi=t, genesets=lapply(0:9, function(x){x*10+(1:10)}),
                        Sigma_xi=matrix(1), indiv=rep(1:(r/nr), each=nr), which_test="asymptotic",
                        which_weights="none", preprocessed=TRUE)
res_genes <- tcgsa_seq(y, x, phi=cbind(t),#, rnorm(r)), #t^2
                      genesets=NULL,
                      Sigma_xi=diag(1), indiv=rep(1:(r/nr), each=nr), which_test="asymptotic",
                      which_weights="none", preprocessed=TRUE)
length(res_genes$pvals[, "rawPval"])
quantile(res_genes$pvals[, "rawPval"])
res_quant[[i]] <- res_genes$pvals[, "rawPval"]
}
#round(rowMeans(sapply(res_quant, quantile)), 3)
#plot(density(unlist(res_quant)))
#mean(unlist(res_quant)<0.05)

## Not run: 
res_genes <- tcgsa_seq(y, x, phi=t, genesets=NULL,
                      Sigma_xi=matrix(1), indiv=rep(1:(r/nr), each=nr), which_test="permutation",
                      which_weights="none", preprocessed=TRUE, n_perm=1000, parallel_comp = FALSE)

mean(res_genes$pvals$rawPval < 0.05)
summary(res_genes$pvals$FDR)

## End(Not run)

tcgsaseq documentation built on Sept. 13, 2020, 5:13 p.m.