impulse_DE: Differential expression analysis using impulse models

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/Impulse_DE_fin.R

Description

Fits an impulse model to time course data and uses this model as a basis to detect differentially expressed (DE) genes. If a single time course data set is given, DE genes are detected over time, whereas if an additional control time course data set is present, DE genes are detected between both datasets.

Usage

1
2
3
4
5
impulse_DE(expression_table = NULL, annotation_table = NULL,
  colname_time = NULL, colname_condition = NULL,
  control_timecourse = FALSE, control_name = NULL, case_name = NULL,
  expr_type = "Array", plot_clusters = TRUE, n_iter = 100,
  n_randoms = 50000, n_process = 4, Q_value = 0.01, new_device = TRUE)

Arguments

expression_table

numeric matrix of expression values; genes should be in rows, samples in columns. Data should be properly normalized and log2-transformed as well as filtered for present or variable genes.

annotation_table

table providing co-variables for the samples including condition and time points. Time points must be numeric numbers.

colname_time

character string specifying the column name of the co-variable "Time" in annotation_table

colname_condition

character string specifying the column name of the co-variable "Condition" in annotation_table

control_timecourse

logical indicating whether a control time timecourse is part of the data set (TRUE) or not (FALSE). Default is FALSE.

control_name

character string specifying the name of the control condition in annotation_table.

case_name

character string specifying the name of the case condition in annotation_table. Should be set if more than two conditions are present in annotation_table.

expr_type

character string with allowed values "Array" or "Seq". Default is "Array".

plot_clusters

logical indicating whether to plot the clusters (TRUE) or not (FALSE). Default is TRUE.

n_iter

numeric value specifying the number of iterations, which are performed to fit the impulse model to the clusters. Default is 100.

n_randoms

numeric value specifying the number of generated randomized background iterations, which are used for differential expression analysis. Default is 50000 and this value should not be decreased.

n_process

numeric value indicating the number of processes, which can be used on the machine to run calculations in parallel. Default is 4. The specified value is internally changed to min(detectCores() - 1, n_process) using the detectCores function from the package parallel to avoid overload.

Q_value

numeric value specifying the cutoff to call genes significantly differentially expressed after FDR correction (adjusted p-value). Default is 0.01.

new_device

logical indicating whether each plot should be plotted into a new device (TRUE) or not (FALSE). Default is TRUE.

Details

ImpulseDE is based on the impulse model proposed by Chechik and Koller, which reflects a two-step behavior of genes within a cell responding to environmental changes (Chechik and Koller, 2009). To detect differentially expressed genes, a five-step workflow is followed:

  1. The genes are clustered into a limited number of groups using k-means clustering. If plot_clusters = TRUE, the clusters are plotted.

  2. The impulse model is fitted to the mean expression profiles of the clusters. The best parameter sets are then used for the next step.

  3. The impulse model is fitted to each gene separately using the parameter sets from step 2 as optimal start point guesses.

  4. The impulse model is fitted to a randomized dataset (bootstrap), which is essential to detect significantly differentially expressed genes (Storey et al., 2005).

  5. Detection of differentially expressed genes utilizing the fits to the real and randomized data sets. FDR-correction is performed to obtain adjusted p-values (Benjamini and Hochberg, 1995).

Value

List containing the following elements:

Author(s)

Jil Sander

References

Benjamini, Y. and Hochberg, Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Series B Stat. Methodol., 57, 289-300.

Storey, J.D. et al. (2005) Significance analysis of time course microarray experiments. Proc. Natl. Acad. Sci. USA, 102, 12837-12841.

Rangel, C., Angus, J., Ghahramani, Z., Lioumi, M., Sotheran, E., Gaiba, A., Wild, D.L., Falciani, F. (2004) Modeling T-cell activation using gene expression profiling and state-space models. Bioinformatics, 20(9), 1361-72.

Chechik, G. and Koller, D. (2009) Timing of Gene Expression Responses to Envi-ronmental Changes. J. Comput. Biol., 16, 279-290.

Yosef, N. et al. (2013) Dynamic regulatory network controlling TH17 cell differentiation. Nature, 496, 461-468.

See Also

plot_impulse, calc_impulse.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
#' Install package longitudinal and load it
library(longitudinal)
#' Attach datasets
data(tcell)
#' check dimension of data matrix of interest
dim(tcell.10)
#' generate a proper annotation table
annot <- as.data.frame(cbind("Time" =
   sort(rep(get.time.repeats(tcell.10)$time,10)),
   "Condition" = "activated"), stringsAsFactors = FALSE)
#' Time columns must be numeric
annot$Time <- as.numeric(annot$Time)
#' rownames of annotation table must appear in data table
rownames(annot) = rownames(tcell.10)
#' apply ImpulseDE in single time course mode
#' since genes must be in rows, transpose data matrix using t()
#' For the example, reduce iterations to 10, randomizations to 50, number of
#' genes to 20 and number of used processors to 1:
impulse_results <- impulse_DE(t(tcell.10)[1:20,], annot, "Time", "Condition",
   n_iter = 10, n_randoms = 50, n_process = 1)

ImpulseDE documentation built on April 28, 2020, 9:05 p.m.