In immunogenomics/presto: Fast Functions for Differential Expression using Wilcox and AUC

Introduction

presto makes it fast and easy to run Wilcoxon rank sum test and auROC analysis on large datasets. The tutorial below shows how to install presto, walks through the 3 major ways you can use presto with your data, and finally explores more advanced use cases.

Installation

To install the current stable release from CRAN:

install.packages('presto')

For the cutting edge version of presto:

library(devtools)
install_github('immunogenomics/presto')

options(digits=2)
library(presto)

Input Types

The main function in this vignette is wilcoxauc. presto currently supports 3 interfaces to wilcoxauc, with a matrix, Seurat object, or SingleCellExperiment object. The output of wilcoxauc is described in the next section.

Matrix input

The most general use of presto is with a matrix of features and observations (exprs) paired with a vector of group labels (y).

data(exprs)
head(exprs)[, 1:10]
data(y)
head(y)
head(wilcoxauc(exprs, y))

Seurat

We support interfacing with Seurat Version 3 objects. In the most basic use case, specify the Seurat object and the meta data variable that defines the group labels.

data(object_seurat)
# ensure object structure matches with the installed Seurat version
object_seurat <- Seurat::UpdateSeuratObject(object_seurat)
head(wilcoxauc(object_seurat, 'cell_type'))

Seurat objects can store multiple assays. The assay used by wilcoxauc can be specified with the seurat_assay argument.

head(wilcoxauc(object_seurat, 'cell_type', seurat_assay = 'RNA'))

Seurat supports multiple feature expression matrices, such as raw counts, library normalized data, and scaled data. These can be accessed with assay.

head(wilcoxauc(object_seurat, 'cell_type', assay = 'counts'))
head(wilcoxauc(object_seurat, 'cell_type', assay = 'data'))
head(wilcoxauc(object_seurat, 'cell_type', assay = 'scale.data'))

SingleCellExperiment

presto supports the Bioconductor data structure SingleCellExperiment. Again, the most simple use case takes a SingleCellExperiment object and the metadata field with group labels.

data(object_sce)
head(wilcoxauc(object_sce, 'cell_type'))

SingleCellExperiment can have several data slots, such as counts and logcounts. These can be accessed with assay.

head(wilcoxauc(object_sce, 'cell_type', assay = 'counts'))
head(wilcoxauc(object_sce, 'cell_type', assay = 'logcounts'))

Description of outputs

Results table

All inputs for wilcoxauc give the same table of results.

parameter | description --------- | ----------- feature | name of feature. group | name of group label. avgExpr | mean value of feature in group. logFC | log fold change between observations in group vs out. statistic | Wilcoxon rank sum U statistic. auc | area under the receiver operator curve. pval | nominal p value, from two-tailed Gaussian approximation of U statistic. padj | Benjamini-Hochberg adjusted p value. pct_in | Percent of observations in the group with non-zero feature value. pct_out | Percent of observations out of the group with non-zero feature value.

head(wilcoxauc(exprs, y))

Top markers

We often find it helpful to summarize what the most distinguishing features are in each group.

res <- wilcoxauc(exprs, y)
top_markers(res, n = 10)

We can also filter for some criteria. For instance, the top features must be in at least 70% of all observations within the group. Note that not all groups have 10 markers that meet these criteria.

res <- wilcoxauc(exprs, y)
top_markers(res, n = 10, auc_min = .5, pct_in_min = 70)

Options

Dense vs sparse

presto is optimized for dense and sparse matrix inputs. When possible, use sparse inputs. In our toy dataset, almost 39% of elements are zeros. Thus, it makes sense to cast it as a sparse dgCMatrix and run wilcoxauc on that.

sum(exprs == 0) / prod(dim(exprs))
exprs_sparse <- as(exprs, 'dgCMatrix')
head(wilcoxauc(exprs_sparse, y))

groups_use

Sometimes, you don't want to test all groups in the dataset against all other groups. For instance, I want to compare only observations in group 'A' to those in group 'B'. This is achieved with the groups_use argument.

res_AB <- wilcoxauc(exprs, y, groups_use = c('A', 'B'))
head(res_AB)
top_markers(res_AB)

immunogenomics/presto documentation built on March 26, 2024, 8:18 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

immunogenomics/presto
Fast Functions for Differential Expression using Wilcox and AUC

In immunogenomics/presto: Fast Functions for Differential Expression using Wilcox and AUC

Introduction

Installation

Input Types

Matrix input

Seurat

SingleCellExperiment

Description of outputs

Results table

Top markers

Options

Dense vs sparse

groups_use

R Package Documentation

Browse R Packages

We want your feedback!

immunogenomics/presto Fast Functions for Differential Expression using Wilcox and AUC

In immunogenomics/presto: Fast Functions for Differential Expression using Wilcox and AUC

Introduction

Installation

Input Types

Matrix input

Seurat

SingleCellExperiment

Description of outputs

Results table

Top markers

Options

Dense vs sparse

groups_use

R Package Documentation

Browse R Packages

We want your feedback!

immunogenomics/presto
Fast Functions for Differential Expression using Wilcox and AUC