Introduction

presto makes it fast and easy to run Wilcoxon rank sum test and auROC analysis on large datasets. The tutorial below shows how to install presto, walks through the 3 major ways you can use presto with your data, and finally explores more advanced use cases.

Installation

To install the current stable release from CRAN:

install.packages('presto')

For the cutting edge version of presto:

library(devtools)
install_github('immunogenomics/presto')
options(digits=2)
library(presto)

Input Types

The main function in this vignette is wilcoxauc. presto currently supports 3 interfaces to wilcoxauc, with a matrix, Seurat object, or SingleCellExperiment object. The output of wilcoxauc is described in the next section.

Matrix input

The most general use of presto is with a matrix of features and observations (exprs) paired with a vector of group labels (y).

data(exprs)
head(exprs)[, 1:10]
data(y)
head(y)
head(wilcoxauc(exprs, y))

Seurat

We support interfacing with Seurat Version 3 objects. In the most basic use case, specify the Seurat object and the meta data variable that defines the group labels.

data(object_seurat)
# ensure object structure matches with the installed Seurat version
object_seurat <- Seurat::UpdateSeuratObject(object_seurat)
head(wilcoxauc(object_seurat, 'cell_type'))

Seurat objects can store multiple assays. The assay used by wilcoxauc can be specified with the seurat_assay argument.

head(wilcoxauc(object_seurat, 'cell_type', seurat_assay = 'RNA'))

Seurat supports multiple feature expression matrices, such as raw counts, library normalized data, and scaled data. These can be accessed with assay.

head(wilcoxauc(object_seurat, 'cell_type', assay = 'counts'))
head(wilcoxauc(object_seurat, 'cell_type', assay = 'data'))
head(wilcoxauc(object_seurat, 'cell_type', assay = 'scale.data'))

SingleCellExperiment

presto supports the Bioconductor data structure SingleCellExperiment. Again, the most simple use case takes a SingleCellExperiment object and the metadata field with group labels.

data(object_sce)
head(wilcoxauc(object_sce, 'cell_type'))

SingleCellExperiment can have several data slots, such as counts and logcounts. These can be accessed with assay.

head(wilcoxauc(object_sce, 'cell_type', assay = 'counts'))
head(wilcoxauc(object_sce, 'cell_type', assay = 'logcounts'))

Description of outputs

Results table

All inputs for wilcoxauc give the same table of results.

parameter | description --------- | ----------- feature | name of feature. group | name of group label. avgExpr | mean value of feature in group. logFC | log fold change between observations in group vs out. statistic | Wilcoxon rank sum U statistic. auc | area under the receiver operator curve. pval | nominal p value, from two-tailed Gaussian approximation of U statistic. padj | Benjamini-Hochberg adjusted p value. pct_in | Percent of observations in the group with non-zero feature value. pct_out | Percent of observations out of the group with non-zero feature value.

head(wilcoxauc(exprs, y))

Top markers

We often find it helpful to summarize what the most distinguishing features are in each group.

res <- wilcoxauc(exprs, y)
top_markers(res, n = 10)

We can also filter for some criteria. For instance, the top features must be in at least 70% of all observations within the group. Note that not all groups have 10 markers that meet these criteria.

res <- wilcoxauc(exprs, y)
top_markers(res, n = 10, auc_min = .5, pct_in_min = 70)

Options

Dense vs sparse

presto is optimized for dense and sparse matrix inputs. When possible, use sparse inputs. In our toy dataset, almost 39% of elements are zeros. Thus, it makes sense to cast it as a sparse dgCMatrix and run wilcoxauc on that.

sum(exprs == 0) / prod(dim(exprs))
exprs_sparse <- as(exprs, 'dgCMatrix')
head(wilcoxauc(exprs_sparse, y))

groups_use

Sometimes, you don't want to test all groups in the dataset against all other groups. For instance, I want to compare only observations in group 'A' to those in group 'B'. This is achieved with the groups_use argument.

res_AB <- wilcoxauc(exprs, y, groups_use = c('A', 'B'))
head(res_AB)
top_markers(res_AB)


immunogenomics/presto documentation built on March 26, 2024, 8:18 a.m.