Time course data is often used to study the dynamics in a biological process after perturation at certain time. Inferring the perturbation time under different scenarios in a biological process allows us to identify these critical moments and focus on any following activities in the process, which is of critical importance in understanding likely caucal relationships. In DEtime package, we propose a Bayesian method to infer the perturbation time from a control and perturbed system. A non-parametric Gaussian Process regression is applied in deriving the posterior distribution of the perturbation point. This vignette explains how to use the package. For further information of the algorithm, please refer to our paper:
Jing Yang, Christopher A. Penfold, Murray R. Grant and Magnus Rattray, Inferring the perturbation time from biological time course data, Bioinformatics, 32(19): pp 2956-2964, 2016
This package implements the Gaussian regression framework for perturbation time point inferrence in a two sample case. The package contains two main functions: DEtime_infer, which is used to find out perturbation point of genes, and DEtime_rank, which is used to filter these silent genes before carrying out perturbation point inference by DEtime_infer function.
The package works on the time course data from a wild-type and a perturbed system. Acting upon pre-defined testing perturbation time, the package goes over these perturbation time candidates and derives their likelihoods. From Bayes' theory, under a uniform prior assumption, the posterior distribution of the tested perturbation time is derived from their corresponding likeliooods. Maximum a posterior (MAP), mean or median of the posterior distribution can be taken as the solution to the estimated perturbation time point.
Package: DEtime Type: Package Version: 1.1 Date: 2017-01-14 License: GPL-3
Author(s)
Jing Yang
Maintainer
Jing Yang ynnjing@gmail.com
Description
DEtime_infer is the main function in DEtime Package, which applies a mixedGP kernel to time course data under control and perturbed conditions. It returns the posterior distribution of these predefined perturbation time candidates and relevant statistical estimations of the inferred perturbation time point.
Usage
DEtime_infer(ControlTimes, ControlData, PerturbedTimes, PerturbedData, TestTimes=NULL, gene_ID=NULL, bound.lengthscale=NULL)
Arguments
ControlTimes: experimental time points at which the time course data for the control condition are measured. They can either be ordered by time, for instance t1,t1,t2,t2,... or ordered by replicated time, for instance t1,t2,...,t1,t2,...
ControlData: The measured time course data under control condtion. The data is a matrix where each row represents the time course data for one particular gene. The measurements have to match the time points in ControlTimes.
PerturbedTimes: experimental time points at which the time course data for the perturbed condition are measured. They can either be ordered by time, for instance t1,t1,t2,t2,$...$ or ordered by replicated time, for instance t1,t2,$...$,t1,t2,$...$. The replicates do not have to be the same everywhere. ControlTimes and PerturbedTimes can differ from each other.
PerturbedData: The measured time course data under perturbed condtion. The data is a matrix where each row represents the time course data for one particular gene. The measurements have to match the time points in PerturbedTimes.
TestTimes: perturbation time points which will be evalued by DEtime_infer function. TestTimes has to be in the range of times and evenly spaced. If this input is missing, TestTimes is set to 50 time points evenly spaced between the minimum of ControlTimes and PerturbedTimes and the maximum of ControlTimes and PerturbedTimes .
gene_ID: The IDs of genes investigated in the algorithm. If this value is missing, '1', '2', '3', $...$ will be used instead.
bound.lengthscale: the bounds used for the lengthscale parameter in the RBF kernel used in the model. We recommend you not to change this parameter unless necessary.
Returns
The function will return a DEtimeOutput object which contains:
result: statistical estimations for the inferred perturbation time, which includes:
$posterior: posterior distribution of the tested perturbation time points
Details
Control and perturbed data can be measured at different time points with differnt numbers of replicates. However, it would be reasonable to have control and perturbed data measured at roughly the same region. to facilitate the estimation of perturbation point.
Examples
## read simulated example data library("DEtime") data(SimulatedData) res <- DEtime_infer(ControlTimes = ControlTimes, ControlData = ControlData, PerturbedTimes = PerturbedTimes, PerturbedData=PerturbedData)
Description
DEtime_rank is the function used for filtering silent genes in DEtime Package. In this function, an independent GP and an integrated GP are applied to model the time course data under control and perturbed conditions, respectively. The log-likelihood ratio of the GP modeling result is used as an indication of the differential expression of the studied gene. A higher rank generally indicates better differential expression.
Usage
DEtime_rank(ControlTimes, ControlData, PerturbedTimes, PerturbedData, gene_ID = NULL, bound.lengthscale = NULL, savefile = TRUE)
Arguments
ControlTimes: experimental time points at which the time course data for the control condition are measured. They can either be ordered by time, for instance t1,t1,t2,t2,... or ordered by replicated time, for instance t1,t2,...,t1,t2,...
ControlData: The measured time course data under control condtion. The data is a matrix where each row represents the time course data for one particular gene. The measurements have to match the time points in ControlTimes.
PerturbedTimes: experimental time points at which the time course data for the perturbed condition are measured. They can either be ordered by time, for instance t1,t1,t2,t2,$...$ or ordered by replicated time, for instance t1,t2,$...$,t1,t2,$...$. The replicates do not have to be the same everywhere. And ControlTimes and PerturbedTimes do not have to be exactly the same.
PerturbedData: The measured time course data under perturbed condtion. The data is a matrix where each row represents the time course data for one particular gene. The measurements have to match the time points in PerturbedTimes.
gene_ID: The IDs of genes investigated in the algorithm. If this value is missing, '1', '2', '3', $...$ will be used as the gene IDs instead.
bound.lengthscale: the bounds used for the lengthscale parameter in the RBF kernel used in the model. We recommend you not to change this parameter unless necessary.
savefile: A BOOLEAN parameter used to indicate if the ranking list will be saved in a file or not. If set to TRUE, the result will be saved in DEtime_rank.txt
Returns
The function will return a table which contains the gene_IDs as the first column and the associated loglikelihood ratio as the second column.
Details
Control and perturbed data can be measured at different time points with differnt numbers of replicates. However, it would be reasonable to have control and perturbed data measured at roughly the same region. to facilitate the estimation of perturbation point.
Examples
## read simulated example data library("DEtime") data(SimulatedData) res <- DEtime_rank(ControlTimes = ControlTimes, ControlData = ControlData, PerturbedTimes = PerturbedTimes, PerturbedData=PerturbedData, savefile=TRUE)
Description
print_DEtime prints the results returned from DEtime_infer function, which will show the gene_ID associated with MAP, mean, median, ptl5 (lower 5 percentile) and ptl95 (upper 5 percentile) of the posterior distribution of inferred perturbation time points.
Usage
print_DEtime(DEtimeOutput)
Argument
Example
library("DEtime") ## read simulated example data data(SimulatedData) res <- DEtime_infer(ControlTimes = ControlTimes, ControlData = ControlData, PerturbedTimes = PerturbedTimes, PerturbedData=PerturbedData) print_DEtime(res)
Description
plot_DEtime plots the results returned from DEtime_infer function. The produced figures show the the posterior distribution of inferred perturbation time points on the upper panel and Gaussian Regression of the original data on the lower panel. Please note that by default the MAP solution of the perturbation point is taken as the optimized estimate to the perturbation point and Gaussian Regression is derived based upon this estimated perturabtion point.
Usage
plot_DEtime(DEtimeOutput, BestPerturbPos=NULL, plot_gene_ID=NULL)
Argument
DEtimeOutput: the result from DEtime_infer function
BestPerturbPos: to choose which statistical inference of the posterior distribution of the perturbation points to be used as the optimized estimate to the final perturbation point. You can set this parameter to "mean", "median" or "MAP", so that the corresponding statistical results from the posterior distribution of the perturbation points will be used in Gaussian regression plotting. If not given, MAP solution will be used.
plot_gene_ID: the gene_IDs of those genes whose GP regression and posterior distribution of the perturbation time points will be plotted. If not supplied, all the genes will be plotted.
Example
library("DEtime") ## read simulated example data data(SimulatedData) res <- DEtime_infer(ControlTimes = ControlTimes, ControlData = ControlData, PerturbedTimes=PerturbedTimes, PerturbedData=PerturbedData) plot_DEtime(res)
Descriptions of the real data
In this experiment, the aim is to study the transcriptional change occuring in Arabidopsis following inoculation with P. syringae pv. tomato DC3000 (PtoDC3000) versus the disarmed strain Pto DC3000hrpA
The data contain two different time series:
In this example, the perturbation time between perturbed condition 1 and perturbed condition 2 is inferred.
library("DEtime") ## import data data(RealData) ## calculate the loglikelihood ratio for each gene res_rank <- DEtime_rank(ControlTimes = ControlTimes, ControlData = ControlData, PerturbedTimes = PerturbedTimes, PerturbedData=PerturbedData) ## inferring the perturbation point by DEtime_infer res <- DEtime_infer(ControlTimes = ControlTimes, ControlData = ControlData, PerturbedTimes = PerturbedTimes, PerturbedData=PerturbedData) ## Print a summary of the results print_DEtime(res) ## plot the gene with loglikelihood ratio > 5 plot_DEtime(res, plot_gene_ID=as.character(which(res_rank[,2]>5)))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.