Performing scClassify using pretrained model

knitr::opts_chunk$set(
  collapse = TRUE,
  warning = FALSE,
  message = FALSE,
  comment = "#>"
)

Introduction

A common application of single-cell RNA sequencing (RNA-seq) data is to identify discrete cell types. To take advantage of the large collection of well-annotated scRNA-seq datasets, scClassify package implements a set of methods to perform accurate cell type classification based on ensemble learning and sample size calculation.

This vignette will provide an example showing how users can use a pretrained model of scClassify to predict cell types. A pretrained model is a scClassifyTrainModel object returned by train_scClassify(). A list of pretrained model can be found in https://sydneybiox.github.io/scClassify/index.html.

First, install scClassify, install BiocManager and use BiocManager::install to install scClassify package.

# installation of scClassify
if (!requireNamespace("BiocManager", quietly = TRUE)) {
  install.packages("BiocManager")
}
BiocManager::install("scClassify")

Setting up the data

We assume that you have log-transformed (size-factor normalized) matrices as query datasets, where each row refers to a gene and each column a cell. For demonstration purposes, we will take a subset of single-cell pancreas datasets from one independent study (Wang et al.).

library(scClassify)
data("scClassify_example")
wang_cellTypes <- scClassify_example$wang_cellTypes
exprsMat_wang_subset <- scClassify_example$exprsMat_wang_subset
exprsMat_wang_subset <- as(exprsMat_wang_subset, "dgCMatrix")

Here, we load our pretrained model using a subset of the Xin et al. human pancreas dataset as our reference data.

First, let us check basic information relating to our pretrained model.

data("trainClassExample_xin")
trainClassExample_xin

In this pretrained model, we have selected the genes based on Differential Expression using limma. To check the genes that are available in the pretrained model:

features(trainClassExample_xin)

We can also visualise the cell type tree of the reference data.

plotCellTypeTree(cellTypeTree(trainClassExample_xin))

Running scClassify

Next, we perform predict_scClassify with our pretrained model trainRes = trainClassExample to predict the cell types of our query data matrix exprsMat_wang_subset_sparse. Here, we used pearson and spearman as similarity metrics.

pred_res <- predict_scClassify(exprsMat_test = exprsMat_wang_subset,
                               trainRes = trainClassExample_xin,
                               cellTypes_test = wang_cellTypes,
                               algorithm = "WKNN",
                               features = c("limma"),
                               similarity = c("pearson", "spearman"),
                               prob_threshold = 0.7,
                               verbose = TRUE)

Noted that the cellType_test is not a required input. For datasets with unknown labels, users can simply leave it as cellType_test = NULL.

Prediction results for pearson as the similarity metric:

table(pred_res$pearson_WKNN_limma$predRes, wang_cellTypes)

Prediction results for spearman as the similarity metric:

table(pred_res$spearman_WKNN_limma$predRes, wang_cellTypes)

Session Info

sessionInfo()


Try the scClassify package in your browser

Any scripts or data that you put into this service are public.

scClassify documentation built on Nov. 8, 2020, 8:08 p.m.