presto makes it fast and easy to run Wilcoxon rank sum test and auROC analysis on large datasets. The tutorial below shows how to install presto, walks through the 3 major ways you can use presto with your data, and finally explores more advanced use cases.
To install the current stable release from CRAN:
install.packages('presto')
For the cutting edge version of presto:
library(devtools) install_github('immunogenomics/presto')
options(digits=2) library(presto)
The main function in this vignette is wilcoxauc
. presto currently supports 3
interfaces to wilcoxauc
, with a matrix, Seurat object, or
SingleCellExperiment object. The output of wilcoxauc
is described in the
next section.
The most general use of presto is with a matrix of features and observations (exprs) paired with a vector of group labels (y).
data(exprs) head(exprs)[, 1:10] data(y) head(y) head(wilcoxauc(exprs, y))
We support interfacing with Seurat Version 3 objects. In the most basic use case, specify the Seurat object and the meta data variable that defines the group labels.
data(object_seurat) # ensure object structure matches with the installed Seurat version object_seurat <- Seurat::UpdateSeuratObject(object_seurat) head(wilcoxauc(object_seurat, 'cell_type'))
Seurat objects can store multiple assays. The assay used by wilcoxauc
can be
specified with the seurat_assay
argument.
head(wilcoxauc(object_seurat, 'cell_type', seurat_assay = 'RNA'))
Seurat supports multiple feature expression matrices, such as raw counts,
library normalized data, and scaled data. These can be accessed with assay
.
head(wilcoxauc(object_seurat, 'cell_type', assay = 'counts')) head(wilcoxauc(object_seurat, 'cell_type', assay = 'data')) head(wilcoxauc(object_seurat, 'cell_type', assay = 'scale.data'))
presto supports the Bioconductor data structure SingleCellExperiment. Again, the most simple use case takes a SingleCellExperiment object and the metadata field with group labels.
data(object_sce) head(wilcoxauc(object_sce, 'cell_type'))
SingleCellExperiment can have several data slots, such as counts and logcounts.
These can be accessed with assay
.
head(wilcoxauc(object_sce, 'cell_type', assay = 'counts')) head(wilcoxauc(object_sce, 'cell_type', assay = 'logcounts'))
All inputs for wilcoxauc
give the same table of results.
parameter | description --------- | ----------- feature | name of feature. group | name of group label. avgExpr | mean value of feature in group. logFC | log fold change between observations in group vs out. statistic | Wilcoxon rank sum U statistic. auc | area under the receiver operator curve. pval | nominal p value, from two-tailed Gaussian approximation of U statistic. padj | Benjamini-Hochberg adjusted p value. pct_in | Percent of observations in the group with non-zero feature value. pct_out | Percent of observations out of the group with non-zero feature value.
head(wilcoxauc(exprs, y))
We often find it helpful to summarize what the most distinguishing features are in each group.
res <- wilcoxauc(exprs, y) top_markers(res, n = 10)
We can also filter for some criteria. For instance, the top features must be in at least 70% of all observations within the group. Note that not all groups have 10 markers that meet these criteria.
res <- wilcoxauc(exprs, y) top_markers(res, n = 10, auc_min = .5, pct_in_min = 70)
presto is optimized for dense and sparse matrix inputs. When possible, use sparse inputs. In our toy dataset, almost 39% of elements are zeros. Thus, it makes sense to cast it as a sparse dgCMatrix and run wilcoxauc on that.
sum(exprs == 0) / prod(dim(exprs)) exprs_sparse <- as(exprs, 'dgCMatrix') head(wilcoxauc(exprs_sparse, y))
Sometimes, you don't want to test all groups in the dataset against all other groups. For instance, I want to compare only observations in group 'A' to those in group 'B'. This is achieved with the groups_use argument.
res_AB <- wilcoxauc(exprs, y, groups_use = c('A', 'B')) head(res_AB) top_markers(res_AB)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.