TransferData: Transfer data

Description Usage Arguments Details Value References Examples

View source: R/integration.R

Description

Transfer categorical or continuous data across single-cell datasets. For transferring categorical information, pass a vector from the reference dataset (e.g. refdata = reference$celltype). For transferring continuous information, pass a matrix from the reference dataset (e.g. refdata = GetAssayData(reference[['RNA']])).

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
TransferData(
  anchorset,
  refdata,
  reference = NULL,
  query = NULL,
  weight.reduction = "pcaproject",
  l2.norm = FALSE,
  dims = NULL,
  k.weight = 50,
  sd.weight = 1,
  eps = 0,
  n.trees = 50,
  verbose = TRUE,
  slot = "data",
  prediction.assay = FALSE,
  store.weights = TRUE
)

Arguments

anchorset

An AnchorSet object generated by FindTransferAnchors

refdata

Data to transfer. This can be specified in one of two ways:

  • The reference data itself as either a vector where the names correspond to the reference cells, or a matrix, where the column names correspond to the reference cells.

  • The name of the metadata field or assay from the reference object provided. This requires the reference parameter to be specified. If pulling assay data in this manner, it will pull the data from the data slot. To transfer data from other slots, please pull the data explicitly with GetAssayData and provide that matrix here.

reference

Reference object from which to pull data to transfer

query

Query object into which the data will be transferred.

weight.reduction

Dimensional reduction to use for the weighting anchors. Options are:

  • pcaproject: Use the projected PCA used for anchor building

  • pca: Use an internal PCA on the query only

  • cca: Use the CCA used for anchor building

  • custom DimReduc: User provided DimReduc object computed on the query cells

l2.norm

Perform L2 normalization on the cell embeddings after dimensional reduction

dims

Set of dimensions to use in the anchor weighting procedure

k.weight

Number of neighbors to consider when weighting anchors

sd.weight

Controls the bandwidth of the Gaussian kernel for weighting

eps

Error bound on the neighbor finding algorithm (from RANN)

n.trees

More trees gives higher precision when using annoy approximate nearest neighbor search

verbose

Print progress bars and output

slot

Slot to store the imputed data. Must be either "data" (default) or "counts"

prediction.assay

Return an Assay object with the prediction scores for each class stored in the data slot.

store.weights

Optionally store the weights matrix used for predictions in the returned query object.

Details

The main steps of this procedure are outlined below. For a more detailed description of the methodology, please see Stuart, Butler, et al Cell 2019. doi: 10.1016/j.cell.2019.05.031; doi: 10.1101/460147

For both transferring discrete labels and also feature imputation, we first compute the weights matrix.

The main difference between label transfer (classification) and feature imputation is what gets multiplied by the weights matrix. For label transfer, we perform the following steps:

For feature imputation, we perform the following step:

Value

If query is not provided, for the categorical data in refdata, returns a data.frame with label predictions. If refdata is a matrix, returns an Assay object where the imputed data has been stored in the provided slot.

If query is provided, a modified query object is returned. For the categorical data in refdata, prediction scores are stored as Assays (prediction.score.NAME) and two additional metadata fields: predicted.NAME and predicted.NAME.score which contain the class prediction and the score for that predicted class. For continuous data, an Assay called NAME is returned. NAME here corresponds to the name of the element in the refdata list.

References

Stuart T, Butler A, et al. Comprehensive Integration of Single-Cell Data. Cell. 2019;177:1888-1902 doi: 10.1016/j.cell.2019.05.031

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
## Not run: 
# to install the SeuratData package see https://github.com/satijalab/seurat-data
library(SeuratData)
data("pbmc3k")

# for demonstration, split the object into reference and query
pbmc.reference <- pbmc3k[, 1:1350]
pbmc.query <- pbmc3k[, 1351:2700]

# perform standard preprocessing on each object
pbmc.reference <- NormalizeData(pbmc.reference)
pbmc.reference <- FindVariableFeatures(pbmc.reference)
pbmc.reference <- ScaleData(pbmc.reference)

pbmc.query <- NormalizeData(pbmc.query)
pbmc.query <- FindVariableFeatures(pbmc.query)
pbmc.query <- ScaleData(pbmc.query)

# find anchors
anchors <- FindTransferAnchors(reference = pbmc.reference, query = pbmc.query)

# transfer labels
predictions <- TransferData(anchorset = anchors, refdata = pbmc.reference$seurat_annotations)
pbmc.query <- AddMetaData(object = pbmc.query, metadata = predictions)

## End(Not run)

ibseq/scs-analysis documentation built on Feb. 27, 2021, 12:35 a.m.