peco is an R package for PrEdicting Cell cycle prOgression in a continuum using scRNA-seq data.
To install and load the package, run:
install.packages("devtools")
library(devtools)
install_github("jhsiao999/peco")
peco
uses SingleCellExperiment
class objects.
library(peco)
library(SingleCellExperiment)
## Loading required package: SummarizedExperiment
## Loading required package: GenomicRanges
## Loading required package: stats4
## Loading required package: BiocGenerics
## Loading required package: parallel
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:parallel':
##
## clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
## clusterExport, clusterMap, parApply, parCapply, parLapply,
## parLapplyLB, parRapply, parSapply, parSapplyLB
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## anyDuplicated, append, as.data.frame, basename, cbind,
## colMeans, colnames, colSums, dirname, do.call, duplicated,
## eval, evalq, Filter, Find, get, grep, grepl, intersect,
## is.unsorted, lapply, lengths, Map, mapply, match, mget, order,
## paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind,
## Reduce, rowMeans, rownames, rowSums, sapply, setdiff, sort,
## table, tapply, union, unique, unsplit, which, which.max,
## which.min
## Loading required package: S4Vectors
##
## Attaching package: 'S4Vectors'
## The following object is masked from 'package:base':
##
## expand.grid
## Loading required package: IRanges
## Loading required package: GenomeInfoDb
## Loading required package: Biobase
## Welcome to Bioconductor
##
## Vignettes contain introductory material; view with
## 'browseVignettes()'. To cite Bioconductor, see
## 'citation("Biobase")', and for packages 'citation("pkgname")'.
## Loading required package: DelayedArray
## Loading required package: matrixStats
##
## Attaching package: 'matrixStats'
## The following objects are masked from 'package:Biobase':
##
## anyMissing, rowMedians
## Loading required package: BiocParallel
##
## Attaching package: 'DelayedArray'
## The following objects are masked from 'package:matrixStats':
##
## colMaxs, colMins, colRanges, rowMaxs, rowMins, rowRanges
## The following objects are masked from 'package:base':
##
## aperm, apply
library(doParallel)
## Loading required package: foreach
## Loading required package: iterators
library(foreach)
peco
is a supervised approach for PrEdicting cell cycle phase in a
COntinuum using single-cell RNA sequencing data. The R package provides
functions to build training dataset and also functions to use existing
training data to predict cell cycle on a continuum.
Our work demonstrated that peco is able to predict continuous cell cylce phase using a small set of cylcic genes: CDK1, UBE2C, TOP2A, HISTH1E, and HISTH1C (identified as cell cycle marker genes in studies of yeast (Spellman et al., 1998) and HeLa cells (Whitfield et al., 2002)).
Below we provide two use cases. Vignette 1 shows how to use the built-training dataset to predict continuous cell cycle. Vignette 2 shows how to make a training datast and build a predictor using training data.
Users can also view the vigenettes via browseVignettes("peco")
.
training_human
stores built-in training data of 101 significant cyclic
genes. Below are the slots contained in training_human
:
predict.yy
: a gene by sample matrix (101 by 888) that stores
predict cyclic expression values.cellcycle_peco_reordered
: cell cycle phase in a unit circle
(angle), ordered from 0 to 2(pi)cellcycle_function
: lists of 101 function corresponding to the top
101 cyclic genes identified in our datasetsigma
: standard error associated with cyclic trends of gene
expressionpve
: proportion of variance explained by the cyclic trenddata("training_human")
peco
is integrated with SingleCellExperiment
object in Bioconductor.
Below shows an example of inputting SingleCellExperiment
object to
perform cell cycle phase prediction.
sce_top101genes
includes 101 genes and 888 single-cell samples and one
assay slot of counts
.
data("sce_top101genes")
assays(sce_top101genes)
## List of length 1
## names(1): counts
Transform the expression values to quantile-normalizesd
counts-per-million values. peco
uses the cpm_quantNormed
slot as
input data for predictions.
sce_top101genes <- data_transform_quantile(sce_top101genes)
## computing on 2 cores
assays(sce_top101genes)
## List of length 3
## names(3): counts cpm cpm_quantNormed
Apply the prediction model using function cycle_npreg_outsample
and
generate prediction results contained in a list object
pred_top101genes
.
pred_top101genes <- cycle_npreg_outsample(
Y_test=sce_top101genes,
sigma_est=training_human$sigma[rownames(sce_top101genes),],
funs_est=training_human$cellcycle_function[rownames(sce_top101genes)],
method.trend="trendfilter",
ncores=1,
get_trend_estimates=FALSE)
The pred_top101genes$Y
contains a SingleCellExperiment object with the
predict cell cycle phase in the colData
slot.
head(colData(pred_top101genes$Y)$cellcycle_peco)
## 20170905-A01 20170905-A02 20170905-A03 20170905-A06 20170905-A07
## 1.099557 4.555309 2.481858 4.303982 4.052655
## 20170905-A08
## 1.413717
Visualize results of prediction for one gene. Below we choose CDK1
(“ENSG00000170312”). Because CDK1 is a known cell cycle gene, this
visualization serves as a sanity check for the results of fitting. The
fitted function training_human$cellcycle_function[[1]]
was obtained
from our training data.
plot(y=assay(pred_top101genes$Y,"cpm_quantNormed")["ENSG00000170312",],
x=colData(pred_top101genes$Y)$theta_shifted, main = "CDK1",
ylab = "quantile normalized expression")
points(y=training_human$cellcycle_function[["ENSG00000170312"]](seq(0,2*pi, length.out=100)),
x=seq(0,2*pi, length.out=100), col = "blue", pch =16)
Visualize results of prediction for the top 10 genesone genes. Use
fit_cyclical_many
to estimate cyclic function based on the input data.
# predicted cell time in the input data
theta_predict = colData(pred_top101genes$Y)$cellcycle_peco
names(theta_predict) = rownames(colData(pred_top101genes$Y))
# expression values of 10 genes in the input data
yy_input = assay(pred_top101genes$Y,"cpm_quantNormed")[1:6,]
# apply trendfilter to estimate cyclic gene expression trend
fit_cyclic <- fit_cyclical_many(Y=yy_input,
theta=theta_predict)
## computing on 2 cores
gene_symbols = rowData(pred_top101genes$Y)$hgnc[rownames(yy_input)]
par(mfrow=c(2,3))
for (i in 1:6) {
plot(y=yy_input[i,],
x=fit_cyclic$cellcycle_peco_ordered,
main = gene_symbols[i],
ylab = "quantile normalized expression")
points(y=fit_cyclic$cellcycle_function[[i]](seq(0,2*pi, length.out=100)),
x=seq(0,2*pi, length.out=100), col = "blue", pch =16)
}
sessionInfo()
## R version 3.5.1 (2018-07-02)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Scientific Linux 7.4 (Nitrogen)
##
## Matrix products: default
## BLAS/LAPACK: /software/openblas-0.2.19-el7-x86_64/lib/libopenblas_haswellp-r0.2.19.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] doParallel_1.0.14 iterators_1.0.12
## [3] foreach_1.4.4 SingleCellExperiment_1.4.1
## [5] SummarizedExperiment_1.12.0 DelayedArray_0.8.0
## [7] BiocParallel_1.16.0 matrixStats_0.55.0
## [9] Biobase_2.42.0 GenomicRanges_1.34.0
## [11] GenomeInfoDb_1.18.1 IRanges_2.16.0
## [13] S4Vectors_0.20.1 BiocGenerics_0.28.0
## [15] peco_0.99.10
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.3 mvtnorm_1.0-11
## [3] lattice_0.20-38 assertthat_0.2.1
## [5] rprojroot_1.3-2 digest_0.6.20
## [7] plyr_1.8.4 R6_2.4.0
## [9] backports_1.1.2 genlasso_1.4
## [11] evaluate_0.12 pracma_2.2.9
## [13] ggplot2_3.2.1 pillar_1.3.1
## [15] zlibbioc_1.28.0 rlang_0.4.0
## [17] lazyeval_0.2.1 Matrix_1.2-17
## [19] rmarkdown_1.10 geigen_2.3
## [21] stringr_1.3.1 igraph_1.2.2
## [23] RCurl_1.95-4.11 munsell_0.5.0
## [25] HDF5Array_1.10.1 vipor_0.4.5
## [27] compiler_3.5.1 pkgconfig_2.0.3
## [29] ggbeeswarm_0.6.0 htmltools_0.3.6
## [31] tidyselect_0.2.5 tibble_2.1.1
## [33] gridExtra_2.3 GenomeInfoDbData_1.2.0
## [35] codetools_0.2-15 viridisLite_0.3.0
## [37] crayon_1.3.4 dplyr_0.8.0.1
## [39] MASS_7.3-51.1 bitops_1.0-6
## [41] grid_3.5.1 gtable_0.2.0
## [43] magrittr_1.5 scales_1.0.0
## [45] stringi_1.2.4 reshape2_1.4.3
## [47] XVector_0.22.0 viridis_0.5.1
## [49] scater_1.10.1 DelayedMatrixStats_1.4.0
## [51] boot_1.3-20 Rhdf5lib_1.4.3
## [53] tools_3.5.1 beeswarm_0.2.3
## [55] glue_1.3.0 purrr_0.3.2
## [57] yaml_2.2.0 colorspace_1.3-2
## [59] rhdf5_2.26.2 circular_0.4-93
## [61] conicfit_1.0.4 knitr_1.20
Please contact me at joyce.hsiao1@gmail.com for questions on the package or the methods.
Hsiao, C. J., Tung, P., Blischak, J. D., Burnett, J., Dey, K. K., Barr, A. K., Stephens, M., and Gilad, Y. (2018). Characterizing and inferring quantitative cell-cycle phase in single-cell RNA-seq data analysis. bioRxiv
Copyright (c) 2019-2020, Joyce Hsiao.
All source code and software in this repository are made available under the terms of the GNU General Public License. See file LICENSE for the full text of the license.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.