Predicting cell cycle phase using peco

knitr::opts_chunk$set(
    collapse = TRUE,
    comment = "#>"
)

Installation

To install and load the package, run:

install.packages("devtools")
library(devtools)
install_github("jhsiao999/peco")

peco uses SingleCellExperiment class objects.

library(peco)
library(SingleCellExperiment)
library(doParallel)
library(foreach)

Overview

peco is a supervised approach for PrEdicting cell cycle phase in a COntinuum using single-cell RNA sequencing data. The R package provides functions to build training dataset and also functions to use existing training data to predict cell cycle on a continuum.

Our work demonstrated that peco is able to predict continuous cell cylce phase using a small set of cylcic genes: CDK1, UBE2C, TOP2A, HISTH1E, and HISTH1C (identified as cell cycle marker genes in studies of yeast (Spellman et al., 1998) and HeLa cells (Whitfield et al., 2002)).

Below we provide two use cases. Vignette 1 shows how to use the built-training dataset to predict continuous cell cycle. Vignette 2 shows how to make a training datast and build a predictor using training data.

Users can also view the vigenettes via browseVignettes("peco").

About the training dataset

training_human stores built-in training data of 101 significant cyclic genes. Below are the slots contained in training_human:

data("training_human")

Predict cell cycle phase using gene expression data

peco is integrated with SingleCellExperiment object in Bioconductor. Below shows an example of inputting SingleCellExperiment object to perform cell cycle phase prediction.

sce_top101genes includes 101 genes and 888 single-cell samples and one assay slot of counts.

data("sce_top101genes")
assays(sce_top101genes)

Transform the expression values to quantile-normalizesd counts-per-million values. peco uses the cpm_quantNormed slot as input data for predictions.

sce_top101genes <- data_transform_quantile(sce_top101genes)
assays(sce_top101genes)

Apply the prediction model using function cycle_npreg_outsample and generate prediction results contained in a list object pred_top101genes.

pred_top101genes <- cycle_npreg_outsample(
    Y_test=sce_top101genes,
    sigma_est=training_human$sigma[rownames(sce_top101genes),],
    funs_est=training_human$cellcycle_function[rownames(sce_top101genes)],
    method.trend="trendfilter",
    ncores=1,
    get_trend_estimates=FALSE)

The pred_top101genes$Y contains a SingleCellExperiment object with the predict cell cycle phase in the colData slot.

head(colData(pred_top101genes$Y)$cellcycle_peco)

Visualize results of prediction for one gene. Below we choose CDK1 ("ENSG00000170312"). Because CDK1 is a known cell cycle gene, this visualization serves as a sanity check for the results of fitting. The fitted function training_human$cellcycle_function[[1]] was obtained from our training data.

plot(y=assay(pred_top101genes$Y,"cpm_quantNormed")["ENSG00000170312",],
     x=colData(pred_top101genes$Y)$theta_shifted, main = "CDK1",
     ylab = "quantile normalized expression")
points(y=training_human$cellcycle_function[["ENSG00000170312"]](seq(0,2*pi, length.out=100)),
       x=seq(0,2*pi, length.out=100), col = "blue", pch =16)

Visualize cyclic expression trend based on predicted phase

Visualize results of prediction for the top 10 genesone genes. Use fit_cyclical_many to estimate cyclic function based on the input data.

# predicted cell time in the input data
theta_predict = colData(pred_top101genes$Y)$cellcycle_peco
names(theta_predict) = rownames(colData(pred_top101genes$Y))

# expression values of 10 genes in the input data
yy_input = assay(pred_top101genes$Y,"cpm_quantNormed")[1:6,]

# apply trendfilter to estimate cyclic gene expression trend
fit_cyclic <- fit_cyclical_many(Y=yy_input, 
                                theta=theta_predict)

gene_symbols = rowData(pred_top101genes$Y)$hgnc[rownames(yy_input)]

par(mfrow=c(2,3))
for (i in 1:6) {
plot(y=yy_input[i,],
     x=fit_cyclic$cellcycle_peco_ordered, 
     main = gene_symbols[i],
     ylab = "quantile normalized expression")
points(y=fit_cyclic$cellcycle_function[[i]](seq(0,2*pi, length.out=100)),
       x=seq(0,2*pi, length.out=100), col = "blue", pch =16)
}

Session information

sessionInfo()


Try the peco package in your browser

Any scripts or data that you put into this service are public.

peco documentation built on Nov. 8, 2020, 8:16 p.m.