Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/geneset-roast.R

Rotation gene set testing for linear models.

1 2 3 4 5 6 7 8 9 10 11 12 | ```
## Default S3 method:
roast(y, index = NULL, design = NULL, contrast = ncol(design), geneid = NULL,
set.statistic = "mean", gene.weights = NULL, var.prior = NULL, df.prior = NULL,
nrot = 1999, approx.zscore = TRUE, legacy = FALSE, ...)
## Default S3 method:
mroast(y, index = NULL, design = NULL, contrast = ncol(design), geneid = NULL,
set.statistic = "mean", gene.weights = NULL, var.prior = NULL, df.prior = NULL,
nrot = 1999, approx.zscore = TRUE, legacy = FALSE, adjust.method = "BH",
midp = TRUE, sort = "directional", ...)
## Default S3 method:
fry(y, index = NULL, design = NULL, contrast = ncol(design), geneid = NULL,
gene.weights = NULL, standardize = "posterior.sd", sort = "directional", ...)
``` |

`y` |
numeric matrix giving log-expression or log-ratio values for a series of microarrays, or any object that can coerced to a matrix including |

`index` |
index vector specifying which rows (probes) of |

`design` |
design matrix |

`contrast` |
contrast for which the test is required.
Can be an integer specifying a column of |

`geneid` |
gene identifiers corresponding to the rows of |

`set.statistic` |
summary set statistic. Possibilities are |

`gene.weights` |
numeric vector of directional (positive or negative) contribution weights specifying the size and direction of the contribution of each probe to the gene set statistics.
For |

`var.prior` |
prior value for residual variances. If not provided, this is estimated from all the data using |

`df.prior` |
prior degrees of freedom for residual variances. If not provided, this is estimated using |

`nrot` |
number of rotations used to compute the p-values. Low values like 999 are suitable for testing but higher values such as 9999 or more are recommended for publication purposes. |

`approx.zscore` |
logical, if |

`legacy` |
logical. See Note below for usage. |

`adjust.method` |
method used to adjust the p-values for multiple testing. See |

`midp` |
logical, should mid-p-values be used in instead of ordinary p-values when adjusting for multiple testing? |

`sort` |
character, whether to sort output table by directional p-value ( |

`standardize` |
how to standardize for unequal probewise variances. Possibilities are |

`...` |
any argument that would be suitable for |

These functions implement rotation gene set tests proposed by Wu et al (2010).
They perform *self-contained* gene set tests in the sense defined by Goeman and Buhlmann (2007).
For *competitive* gene set tests, see `camera`

.
For a gene set enrichment analysis (GSEA) style analysis using a database of gene sets, see `romer`

.

`roast`

and `mroast`

test whether any of the genes in the set are differentially expressed.
They can be used for any microarray experiment that can be represented by a linear model.
The design matrix for the experiment is specified as for the `lmFit`

function, and the contrast of interest is specified as for the `contrasts.fit`

function.
This allows users to focus on differential expression for any coefficient or contrast in a linear model.
If `contrast`

is not specified, then the last coefficient in the linear model will be tested.

The argument `index`

is often made using ids2indices but does not have to be.
Each set to be tested is represented by a vector of row numbers or a vector of gene IDs.
Gene IDs should correspond to either the rownames of `y`

or the entries of `geneid`

.

All three functions support directional contribution gene weights, which can be entered either through the `gene.weights`

argument or via `index`

.
Directional gene weights allow each gene to be flagged as to its direction and magnitude of change based on prior experimentation.
A typical use is to make the `gene.weights`

`1`

or `-1`

depending on whether the gene is up or down-regulated in the pathway under consideration.
Probes with directional weights of opposite signs are expected to have expression changes in opposite directions.
Gene with larger gene weights in absolute size will have more weight in the set statistic calculation.

Gene weights can be either genome-wide or set-specific.
Genome-wide weights can be entered via the `gene.weights`

argument.
Set specific weights can be input by including the gene weights as part of the set's entry in `index`

.
If any of the components of `index`

are data.frames, then the second column will be assumed to be gene contribution weights for that set.
All three functions (`roast`

, `mroast`

and `fry`

) support set-specific gene contribution weights as part of an `index`

data.frame.

Set-specific directional gene weights are used to represent *expression signatures* assembled from previous experiments, from gene annotation or from prior hypotheses.
In the output from `roast`

, `mroast`

or `fry`

, a significant `"Up"`

p-value means that the differential expression results found in `y`

are positively correlated with the expression signature coded by the gene weights.
Conversely, a significant `"Down"`

p-value means that the differential expression log-fold-changes are negatively correlated with the expression signature.

Note that the contribution weights set by `gene.weights`

are different in nature and purpose to the precision weights set by the `weights`

argument of `lmFit`

.
`gene.weights`

control the contribution of each gene to the formation of the gene set statistics and are directional, i.e., can be positive or negative.
`weights`

indicate the precision of the expression measurements and should be positive.
The `weights`

are used to construct genewise test statistics whereas `gene.weights`

are used to combine the genewise test statistics.

The arguments `df.prior`

and `var.prior`

have the same meaning as in the output of the `eBayes`

function.
If these arguments are not supplied, then they are estimated exactly as is done by `eBayes`

.

The gene set statistics `"mean"`

, `"floormean"`

, `"mean50"`

and `msq`

are defined by Wu et al (2010).
The different gene set statistics have different sensitivities when only some of the genes in a set are differentially expressed.
If `set.statistic="mean"`

then the set will be statistically significantly only when the majority of the genes are differentially expressed.
`"floormean"`

and `"mean50"`

will detect as few as 25% differentially expressed in a set.
`"msq"`

is sensitive to even smaller proportions of differentially expressed genes, if the effects are reasonably large.
Overall, the `"msq"`

statistic gives the best power for rejecting the null hypothesis of no differentially expressed genes, but the significance can be driven by a small number of genes.
In many genomic applications it is appropriate to limit results to gene sets for which most of the genes response in a concordance direction, so the relatively conservative `"mean"`

statistic is the default choice.

The output gives p-values three possible alternative hypotheses,
`"Up"`

to test whether the genes in the set tend to be up-regulated, with positive t-statistics,
`"Down"`

to test whether the genes in the set tend to be down-regulated, with negative t-statistics,
and `"Mixed"`

to test whether the genes in the set tend to be differentially expressed, without regard for direction.

`roast`

estimates p-values by simulation, specifically by random rotations of the orthogonalized residuals (Langsrud, 2005), so p-values will vary slightly from run to run.
The p-value is computed as `(b+1)/(nrot+1)`

where `b`

is the number of rotations giving a more extreme statistic than that observed (Phipson and Smyth, 2010).
This means that the smallest possible mixed or two-sided p-values are `1/(nrot+1)`

.
The function uses a symmetry argument to double the effective number of rotations for the one-sided tests, so the smallest possible `"Up"`

or `"Down"`

p-value is `1/(2*nrot+1)`

.

The number of rotations `nrot`

can (and should) be increased tTo get more precise p-values from `roast`

or `mroast`

,
The default `nrot`

is set fairly low to facilitate quick testing and experimentation but the smallest possible two-sided p-value is `1/(nrot+1)`

.
To get definitive p-values for publication, at least `nrot=9999`

or higher is recommended.

`mroast`

does roast tests for multiple sets, including adjustment for multiple testing.
By default, `mroast`

reports ordinary p-values but uses mid-p-values (Routledge, 1994) at the multiple testing stage.
Mid-p-values are probably a good choice when using false discovery rates (`adjust.method="BH"`

) but not when controlling the family-wise type I error rate (`adjust.method="holm"`

).

To improve the performance of the gene set statistics, `roast`

and `mroast`

transform the genewise moderated t-statistics to normality using `zscoreT`

.
By default, an approximate closed-form transformation is used (`approx.zscore=TRUE`

), which is very much faster than the exact transformation and performs just as well.
In Bioconductor 2.10, the transformation used has been changed from Hill's (1970) approximation to Bailey's (1980) formula because the latter is faster and gives more even accuracy; see `zscoreT`

for more details.

`fry`

is a fast approximation to `mroast`

.
In the special case that `df.prior`

is large and `set.statistic="mean"`

, `fry`

gives the same result that`mroast`

would give if an infinite number of rotations were performed.
In other circumstances, when genes have different variances, `fry`

uses a standardization strategy to approximate the `mroast`

results.
Using `fry`

is recommended when performing tests for a large number of sets because it is fast and because it returns higher resolution p-values that are not limited by the number of rotations performed.

`roast`

produces an object of class `"Roast"`

.
This consists of a list with the following components:

`p.value` |
data.frame with columns |

`var.prior` |
prior value for residual variances. |

`df.prior` |
prior degrees of freedom for residual variances. |

`mroast`

produces a data.frame with a row for each set and the following columns:

`NGenes` |
number of genes in set |

`PropDown` |
proportion of genes in set with |

`PropUp` |
proportion of genes in set with |

`Direction` |
direction of change, |

`PValue` |
two-sided directional p-value |

`FDR` |
two-sided directional false discovery rate |

`PValue.Mixed` |
non-directional p-value |

`FDR.Mixed` |
non-directional false discovery rate |

`fry`

produces the same output format as `mroast`

but without the columns `PropDown`

and `ProbUp`

.

For Bioconductor 3.10, `roast`

and `mroast`

have been revised to use much less memory by conducting the rotations in chunks and to be about twice as fast by updating the normalizing transformation used when `approx.zscore=TRUE`

.
For a limited time, users wishing to reproduce Bioconductor 3.9 results exactly can set `legacy=TRUE`

to turn these revisions off.

`approx.score=TRUE`

become the default in Bioconductor 3.0 (October 2014).

The default set statistic was changed from `"msq"`

to `"mean"`

in Bioconductor 2.7 (October 2010).

Gordon Smyth and Di Wu

Goeman, JJ, and Buhlmann, P (2007).
Analyzing gene expression data in terms of gene sets: methodological issues.
*Bioinformatics* 23, 980-987.

Langsrud, O (2005).
Rotation tests.
*Statistics and Computing* 15, 53-60.

Phipson B, and Smyth GK (2010).
Permutation P-values should never be zero: calculating exact P-values when permutations are randomly drawn.
*Statistical Applications in Genetics and Molecular Biology*, Volume 9, Article 39.
http://www.statsci.org/smyth/pubs/PermPValuesPreprint.pdf

Routledge, RD (1994).
Practicing safe statistics with the mid-p.
*Canadian Journal of Statistics* 22, 103-110.

Wu, D, Lim, E, Francois Vaillant, F, Asselin-Labat, M-L, Visvader, JE, and Smyth, GK (2010). ROAST: rotation gene set tests for complex microarray experiments.
*Bioinformatics* 26, 2176-2182.
http://bioinformatics.oxfordjournals.org/content/26/17/2176

See 10.GeneSetTests for a description of other functions used for gene set testing.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | ```
y <- matrix(rnorm(100*4,sd=0.3),100,4)
design <- cbind(Intercept=1,Group=c(0,0,1,1))
# First set of 5 genes are all up-regulated
index1 <- 1:5
y[index1,3:4] <- y[index1,3:4]+3
roast(y,index1,design,contrast=2)
# Second set of 5 genes contains none that are DE
index2 <- 6:10
mroast(y,list(set1=index1,set2=index2),design,contrast=2)
fry(y,list(set1=index1,set2=index2),design,contrast=2)
# Third set of 6 genes contains three down-regulated genes and three up-regulated genes
index3 <- 11:16
y[index3[1:3],3:4] <- y[index3[1:3],3:4]-3
y[index3[4:6],3:4] <- y[index3[4:6],3:4]+3
# Without gene weights
# Mixed p-value is significant for set3 but not the directional p-values
mroast(y,list(set1=index1,set2=index2,set3=index3),design,contrast=2)
fry(y,list(set1=index1,set2=index2,set3=index3),design,contrast=2)
# With gene weights
# Set3 is significantly up (i.e., positively correlated with the weights)
index3 <- data.frame(Gene=11:16,Weight=c(-1,-1,-1,1,1,1))
mroast(y,list(set1=index1,set2=index2,set3=index3),design,contrast=2)
fry(y,list(set1=index1,set2=index2,set3=index3),design,contrast=2)
``` |

```
Active.Prop P.Value
Down 0 0.99899950
Up 1 0.00150075
UpOrDown 1 0.00300000
Mixed 1 0.00300000
NGenes PropDown PropUp Direction PValue FDR PValue.Mixed FDR.Mixed
set1 5 0 1 Up 0.006 0.011 0.006 0.011
set2 5 0 0 Up 0.643 0.643 0.409 0.409
NGenes Direction PValue FDR PValue.Mixed FDR.Mixed
set1 5 Up 0.00273705 0.0054741 0.002588059 0.005176118
set2 5 Up 0.61980031 0.6198003 0.449640065 0.449640065
NGenes PropDown PropUp Direction PValue FDR PValue.Mixed FDR.Mixed
set1 5 0.0 1.0 Up 0.003 0.0075 0.003 0.00750
set2 5 0.0 0.0 Up 0.634 0.7995 0.449 0.44900
set3 6 0.5 0.5 Up 0.800 0.8000 0.007 0.00975
NGenes Direction PValue FDR PValue.Mixed FDR.Mixed
set1 5 Up 0.00273705 0.00821115 0.002588059 0.005877805
set2 5 Up 0.61980031 0.78662464 0.449640065 0.449640065
set3 6 Up 0.78662464 0.78662464 0.003918537 0.005877805
NGenes PropDown PropUp Direction PValue FDR PValue.Mixed FDR.Mixed
set1 5 0 1 Up 0.001 0.00150 0.001 0.00150
set3 6 0 1 Up 0.006 0.00825 0.006 0.00825
set2 5 0 0 Up 0.628 0.62800 0.445 0.44500
NGenes Direction PValue FDR PValue.Mixed FDR.Mixed
set1 5 Up 0.002737050 0.006404688 0.002588059 0.005877805
set3 6 Up 0.004269792 0.006404688 0.003918537 0.005877805
set2 5 Up 0.619800306 0.619800306 0.449640065 0.449640065
```

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.