Description Usage Arguments Details Value Note Author(s) References Examples
The function calculates the empirical FDR based
on derived Fourier scores derived by fourierscore
for the
observed expression and the comparison with scores derived for different background model generated by backgroundData
.
1 | fdrfourier(eset,T,times,background.model="rr",N=100,progress=FALSE)
|
eset |
object of the class “ExpressionSet” |
T |
cycle period |
times |
time of measurements |
background.model |
model for generation of background data: “rr”- permutation within rows, “gauss”- Gaussian background, “ar1”- AR1 models |
N |
number of generated data sets for the background distribution |
progress |
if set to TRUE, a progress of calculations is reported |
To assess the significance of the Fourier score obtained for the original gene expression time series, the probability has to be calculated of how often such a score would be observed by chance based on the chosen background distribution. The statistical significance is given by the calculated false discovery rate. It is defined here as the expected proportion of false positives among all genes detected as periodically expressed. Mathematical details can be found in the given reference.
List with FDR for the features of the eset object (fdr
),
and Fourier scores for ExpressionSet object (F
) and
the background data (F.b
).
This is the main function of the cycle
package. Note that the calculation of FDR employing empirical background distributions can require considerable time (up to several days for large gene expression data sets).
Importantly, this function evaluates soley the exprs
matrix and
no information is used from the phenoData
. In particular,
the ordering of samples (arrays) is the same as the ordering
of the columns in the exprs
matrix. Also, replicated arrays in the
exprs
matrix are treated as independent
i.e. they should be averagered prior to analysis or placed into different
distinct “ExpressionSet” objects.
Matthias E. Futschik (http://www.cbme.ualg.pt/mfutschik_cbme.html)
Matthias E. Futschik and Hanspeter Herzel (2008) Are we overestimating the number of cell-cycling genes? The impact of background models on time series analysis, Bioinformatics, 24(8):1063-1069
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | if (interactive()){
set.seed(1)
data(yeast) # loading the reduced CDC28 yeast set (from the Mfuzz package)
# Data preprocessing
yeast <- filter.NA(yeast) # filters genes with more than 25% of the expression values missing
yeast <- fill.NA(yeast) # for illustration only; rather use knn method for
yeast <- standardise(yeast)
#
T.yeast <- 85 # cell cycle period (t=85min)
times.yeast <- pData(yeast)$time # time of measurements
#
yeast.test <- yeast[1:600,] # To speed up the example
#
NN <- 50 # number of generated background models
# Here, a small number was chosen for demonstration purpose.
# For the actual analysis, rather set N = 1000
# Calculation of FDRs
# i) based on random permutation as background model
fdr.rr <- fdrfourier(eset=yeast.test,T=T.yeast,
times=times.yeast,background.model="rr",N=NN,progress=TRUE)
# ii) based on Gaussian distribution
fdr.g <- fdrfourier(eset=yeast.test,T=T.yeast,
times=times.yeast,background.model="gauss",N=NN,progress=TRUE)
# iii) based on AR(1) models as background
fdr.ar1 <- fdrfourier(eset=yeast.test,T=T.yeast,
times=times.yeast,background.model="ar1",N=NN,progress=TRUE)
# Number of significant genes based on diff. background models
sum(fdr.rr$fdr < 0.1)
sum(fdr.g$fdr < 0.1)
sum(fdr.ar1$fdr < 0.1)
# Plot top scoring gene
plot(times.yeast,exprs(yeast.test)[order(fdr.ar1$fdr)[1],],type="o",
xlab="Time",ylab="Expression",
main=paste(featureNames(yeast.test)[order(fdr.ar1$fdr)[1]],"-- FDR:",
fdr.ar1$fdr[order(fdr.ar1$fdr)[1]]))
# List significant genes
fdr.ar1$fdr[which(fdr.ar1$fdr < 0.1)]
}
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.