Description Usage Arguments Details Value References Examples
ROKU is a method for detecting tissue-specific (or tissue-selective) patterns from gene expression data for many tissues (or samples). ROKU (i) ranks genes according to their overall tissue-specificity using Shannon entropy after data processing and (ii) detects tissues specific to each gene if any exist using an Akaike's information criterion (AIC) procedure.
1 |
data |
numeric matrix or data frame containing microarray data (on log2 scale), where each row indicates the gene or probeset ID, each column indicates the tissue, and each cell indicates a (log2-transformed) expression value of the gene in the tissue. Numeric vector can also be accepted for a single gene expression vector. |
upper.limit |
numeric value (between 0 and 1) specifying the maximum percentage of tissues (or samples) as outliers to each gene. |
sort |
logical. If |
As shown in Figure 1 in the original study of ROKU (Kadota et al., 2006), Shannon entropy H of a gene expression vector (x_{1}, x_{2}, ..., x_{N}) for N tissues can range from zero to log_{2}N, with the value 0 for genes expressed in a single tissue and log_{2}N for genes expressed uniformly in all the tissues. Researchers therefore rely on the low entropy score for the identification of tissue-specific patterns. However, direct calculation of the entropy for raw gene expression vector works well only for detecting tissue-specific patterns when over-expressed in a small number of tissues but unexpressed or slightly expressed in others: The H scores of tissue-specific patterns such as (8,8,2,8,8,8,8,8,8,8) for the 3rd tissue-specific down-regulation (see the Figure 1e) are close to the maximum value (log_{2}N=3.32 when N=10) and cannot identify such patterns as tissue-specific. To detect various kinds of tissue-specific patterns by low entropy score, ROKU processes the original gene expression vector and makes a new vector (x_{1'}, x_{2'}, ..., x_{N'}). The data processing is done by subtracting the one-step Tukey biweight and by taking the absolute value. In case of the above example, ROKU calculates the H score from the processed vector (0,0,6,0,0,0,0,0,0,0), giving very low score (from H = 3.26 before processing to H' = 0 after processing). A major characteristic of ROKU is, therefore, to be able to rank various tissue-specific patterns by using the modified entropy scores.
Note that the modified entropy does not explain to which tissue a gene is
specific, only measuring the degree of overall tissue specificity of the gene.
ROKU employs an AIC-based outlier detection method (Ueda, 1996).
Consider, for example, a hypothetical mixed-type of tissue-selective expression
pattern (1.2, 5.1, 5.2, 5.4, 5.7, 5.9, 6.0, 6.3, 8.5, 8.8) where we
imagine a total of three tissues are specific (down-regulated in tissue1;
up-regulated in tissues 9 and 10). The method first normalize the expression
values by subtracting the mean and dividing by the standard deviation
(i.e., z-score transformation), then sorted in order of increasing
magnitude by
(-2.221, -0.342, -0.294, -0.198, -0.053, 0.043, 0.092, 0.236, 1.296,
1.441). The method evaluates various combinations of outlier candidates
starting from both sides of the values: model1 for non-outlier,
model2 for one outlier for high-side, model3 for two outliers for high-side,
..., modelx for one outlier for down-side, ..., modely for two outliers for
both up- and down sides, and so on. Then, it calculates AIC-like statistic
(called U) for each combination of model and search the best combination
that achieves the lowest U value and is termed the minimum AIC estimate
(MAICE). Since the upper.limit value corresponds to the maximum number of the
outlier candidates, it decides the number of combinations. The AIC-based
method output a vector (1 for up-regulated outliers, -1 for down-regulated
outliers, and 0 for non-outliers) that corresponds to the input vector.
For example, the method outputs a vector (-1, 0, 0, 0, 0, 0, 0, 0, 1, 1)
when using upper.limit = 0.5
and (-1, 0, 0, 0, 0, 0, 0, 0, 0, 0)
when using upper.limit = 0.25
(as default).
See the Kadota et al., 2007 for detailed discussion about the effect of
different parameter settings.
A list containing following fields:
outlier |
A numeric matrix when the input |
H |
A numeric vector when the input |
modH |
A numeric vector when the input |
rank |
A numeric vector or scalar consisting of the rank(s) of
|
Tbw |
a numeric vector or scalar consisting of one-step Tukey's
biweight as an iteratively reweighted measure of central tendency.
This value is in general similar to median value and the same as the
output of |
Kadota K, Konishi T, Shimizu K: Evaluation of two outlier-detection-based methods for detecting tissue-selective genes from microarray data. Gene Regulation and Systems Biology 2007, 1: 9-15.
Kadota K, Ye J, Nakai Y, Terada T, Shimizu K: ROKU: a novel method for identification of tissue-specific genes. BMC Bioinformatics 2006, 7: 294.
Kadota K, Nishimura SI, Bono H, Nakamura S, Hayashizaki Y, Okazaki Y, Takahashi K: Detection of genes with tissue-specific expression patterns using Akaike's Information Criterion (AIC) procedure. Physiol Genomics 2003, 12: 251-259.
Ueda T. Simple method for the detection of outliers. Japanese J Appl Stat 1996, 25: 17-26.
1 2 3 | data(hypoData_ts)
result <- ROKU(hypoData_ts)
|
Loading required package: DESeq
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, sd, var, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, basename, cbind, colMeans, colSums, colnames,
dirname, do.call, duplicated, eval, evalq, get, grep, grepl,
intersect, is.unsorted, lapply, lengths, mapply, match, mget,
order, paste, pmax, pmax.int, pmin, pmin.int, rank, rbind,
rowMeans, rowSums, rownames, sapply, setdiff, sort, table, tapply,
union, unique, unsplit, which, which.max, which.min
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Loading required package: locfit
locfit 1.5-9.1 2013-03-22
Loading required package: lattice
Welcome to 'DESeq'. For improved performance, usability and
functionality, please consider migrating to 'DESeq2'.
Loading required package: DESeq2
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following object is masked from 'package:base':
expand.grid
Loading required package: IRanges
Loading required package: GenomicRanges
Loading required package: GenomeInfoDb
Loading required package: SummarizedExperiment
Loading required package: DelayedArray
Loading required package: matrixStats
Attaching package: 'matrixStats'
The following objects are masked from 'package:Biobase':
anyMissing, rowMedians
Attaching package: 'DelayedArray'
The following objects are masked from 'package:matrixStats':
colMaxs, colMins, colRanges, rowMaxs, rowMins, rowRanges
The following object is masked from 'package:base':
apply
Attaching package: 'DESeq2'
The following objects are masked from 'package:DESeq':
estimateSizeFactorsForMatrix, getVarianceStabilizedData,
varianceStabilizingTransformation
Loading required package: edgeR
Loading required package: limma
Attaching package: 'limma'
The following object is masked from 'package:DESeq2':
plotMA
The following object is masked from 'package:DESeq':
plotMA
The following object is masked from 'package:BiocGenerics':
plotMA
Loading required package: baySeq
Loading required package: abind
Loading required package: ROC
Attaching package: 'TCC'
The following object is masked from 'package:edgeR':
calcNormFactors
Warning messages:
1: no function found corresponding to methods exports from 'DelayedArray' for: 'acbind', 'arbind'
2: no function found corresponding to methods exports from 'SummarizedExperiment' for: 'acbind', 'arbind'
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.