cocluster: Co-clustering bulk and single cell data

View source: R/cocluster.R

coclusterR Documentation

Co-clustering bulk and single cell data

Description

Automatically assigns bulk tissues to single cells through co-clustering.

Shiny App for assigning desired bulk tissues to single cells when tailoring the co-clustering result in co-visualization of bulk and single cell data. The uploaded file is an ".rds" file of a SingleCellExperiment object saved by saveRDS. See the example below.

Refine the bulk-cell assignments by subsetting the assignments according to a threshold, which is a similarity value between bulk and cells.

Filter single cell data and take overlap genes between cell and bulk data. The bulk data are not filtered as they are only used to obtain overlap genes.

Refine the bulk-cell assignments by including custom bulk-cell assignments.

Usage

cocluster(
  bulk,
  cell,
  df.match = NULL,
  min.dim = 11,
  max.dim = 50,
  dimred = "PCA",
  graph.meth = "knn",
  knn.gr = list(),
  snn.gr = list(),
  cluster = "fg",
  wt.arg = list(steps = 4),
  fg.arg = list(),
  sl.arg = list(spins = 25),
  le.arg = list(),
  eb.arg = list(),
  sim.meth = "spearman"
)

desired_bulk_shiny()

filter_asg(res, min.sim = 0)

filter_cell(
  sce,
  bulk = NULL,
  gen.rm = NULL,
  cutoff = 1,
  p.in.cell = 0.4,
  p.in.gen = 0.2,
  com = FALSE,
  verbose = TRUE
)

refine_asg(sce.all, df.desired.bulk = NULL)

Arguments

bulk

The bulk data in form of data.frame, SummarizedExperiment, or SingleCellExperiment. They are only used to obtain overlapping genes with single cell data and not filtered. The default is NULL.

cell

The normalized single cell data in form of SingleCellExperiment.

df.match

A data.frame specifying ground-truth matching between cells and bulk, applicable in co-clustering optimization.

min.dim, max.dim

Integer scalars specifying the minimum (min.dim) and maximum (max.dim) number of (principle components) PCs to retain respectively in denoisePCA. The default is min.dim=11, max. dim=50.

dimred

A string of PCA (default) or UMAP specifying which reduced dimensions to use for creating a nearest neighbor graph.

graph.meth

Method to build a nearest-neighbor graph, snn (see buildSNNGraph) or knn (default, see buildKNNGraph). The clusters are detected by first creating a nearest neighbor graph using snn or knn then partitioning the graph.

knn.gr

Additional arguments in a named list passed to buildKNNGraph.

snn.gr

Additional arguments in a named list passed to buildSNNGraph.

cluster

The clustering method. One of wt (cluster_walktrap, default), fg (cluster_fast_greedy), le (cluster_leading_eigen), sl (cluster_fast_greedy), eb (cluster_edge_betweenness).

wt.arg, fg.arg, sl.arg, le.arg, eb.arg

A named list of arguments passed to wt, fg, le, sl, eb respectively.

sim.meth

Method to calculate similarities between bulk and cells in each cocluster when assigning bulk to cells. spearman (default) or pearson.

res

The coclustering results returned by cocluster.

min.sim

The similarity cutoff for filterig bulk-cell assignments, which is a Pearson's or Spearman's correlation coefficient between bulk and cells. Only bulk-cell assignments with similarity values above the thresold would remain. The default is 0.

sce

A SingleCellExperiment of single cell data.

gen.rm

A regular expression of gene identifiers in single cell data to remove before filtering. E.g. mitochondrial, chloroplast and protoplasting-induced genes (^ATCG|^ATCG). The default is NULL.

cutoff

The minmun count of gene expression. The default is 1.

p.in.cell

The proportion cutoff of counts above cutoff in a cell. The default is 0.4.

p.in.gen

The proportion cutoff of counts above cutoff in a gene. The default is 0.2.

com

Logical, if TRUE the returned cell and bulk data are column-wise combined, otherwise they are separated in a list.

verbose

Logical. If TRUE (default), intermediate messages are printed.

sce.all

The coclustering results returned by cocluster.

df.desired.bulk

A "data.frame" of desired bulk for some cells. The cells could be specified by providing x-y axis ranges in an embedding plot ("UMAP", "PCA", "TSNE") returned by plot_dim. E.g. df.desired.bulk <- data.frame(x.min=c(4, -6), x.max=c(5, -5), y.min=c(-2.5, 2), y.max=c(-2, 2.5), desiredSVGBulk=c('CORT', 'STELE'), dimred='UMAP'), where columns x.min, x.max, y.min, y.max, desiredSVGBulk, dimred are required. In this example, cells located in 4 <= x <= 5 and -2.5 <= y <= -2 in the "UMAP" plot are assigned "STELE", and cells located in -6 <= x <= -5 and 2 <= y <= 2.5 in the "UMAP" plot are assigned "CORT".
Alternatively, the "data.frame" could be downloaded from the Shiny app launched by desired_bulk_shiny.
df.desired.bulk is used to tailor the co-clustering results. That is to say additional true bulk-cell assignments are created and included in the final assignments. If these assignments conflict with the co-colustering results the latter would be overwritten.

Value

A list of coclustering results in SingleCellExperiment and an roc object (relevant in optimization).

A web browser based Shiny app.

A SingleCellExperiment of remaining bulk-cell assignments.

A list of filtered single cell data and bulk data, which have common genes.

A SingleCellExperiment of remaining bulk-cell assignments.

Details

No argument is required, this function launches the Shiny app directly.

Author(s)

Jianhai Zhang jzhan067@ucr.edu
Dr. Thomas Girke thomas.girke@ucr.edu

References

Vacher, Claire-Marie, Helene Lacaille, Jiaqi J. O’Reilly, Jacquelyn Salzbank, Dana Bakalar, Sonia Sebaoui, Philippe Liere, et al. 2021. “Placental Endocrine Function Shapes Cerebellar Development and Social Behavior.” Nature Neuroscience 24 (10): 1392–1401. Ortiz, Cantin, Jose Fernandez Navarro, Aleksandra Jurek, Antje Märtin, Joakim Lundeberg, and Konstantinos Meletis. 2020. “Molecular Atlas of the Adult Mouse Brain.” Science Advances 6 (26): eabb3446. SummarizedExperiment: SummarizedExperiment container. R package version 1.10.1
R Core Team (2018). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/ Amezquita R, Lun A, Becht E, Carey V, Carpp L, Geistlinger L, Marini F, Rue-Albrecht K, Risso D, Soneson C, Waldron L, Pages H, Smith M, Huber W, Morgan M, Gottardo R, Hicks S (2020). “Orchestrating single-cell analysis with Bioconductor.” Nature Methods, 17, 137–145. https://www.nature.com/articles/s41592-019-0654-x.

https://www.w3schools.com/graphics/svg_intro.asp

https://shiny.rstudio.com/tutorial/

https://shiny.rstudio.com/articles/datatables.html

https://rstudio.github.io/DT/010-style.html

https://plot.ly/r/heatmaps/

https://www.gimp.org/tutorials/

https://inkscape.org/en/doc/tutorials/advanced/tutorial-advanced.en.html

http://www.microugly.com/inkscape-quickguide/

https://cran.r-project.org/web/packages/visNetwork/vignettes/Introduction-to-visNetwork.html

Winston Chang, Joe Cheng, JJ Allaire, Yihui Xie and Jonathan McPherson (2017). shiny: Web Application Framework for R. R package version 1.0.3. https://CRAN.R-project.org/package=shiny

Winston Chang and Barbara Borges Ribeiro (2017). shinydashboard: Create Dashboards with 'Shiny'. R package version 0.6.1. https://CRAN.R-project.org/package=shinydashboard

Paul Murrell (2009). Importing Vector Graphics: The grImport Package for R. Journal of Statistical Software, 30(4), 1-37. URL http://www.jstatsoft.org/v30/i04/

Jeroen Ooms (2017). rsvg: Render SVG Images into PDF, PNG, PostScript, or Bitmap Arrays. R package version 1.1. https://CRAN.R-project.org/package=rsvg

H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.

Yihui Xie (2016). DT: A Wrapper of the JavaScript Library 'DataTables'. R package version 0.2. https://CRAN.R-project.org/package=DT

Baptiste Auguie (2016). gridExtra: Miscellaneous Functions for "Grid" Graphics. R package version 2.2.1. https://CRAN.R-project.org/package=gridExtra

Andrie de Vries and Brian D. Ripley (2016). ggdendro: Create Dendrograms and Tree Diagrams Using 'ggplot2'. R package version 0.1-20. https://CRAN.R-project.org/package=ggdendro

Langfelder P and Horvath S, WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 2008, 9:559 doi:10.1186/1471-2105-9-559

Peter Langfelder, Steve Horvath (2012). Fast R Functions for Robust Correlations and Hierarchical Clustering. Journal of Statistical Software, 46(11), 1-17. URL http://www.jstatsoft.org/v46/i11/

Simon Urbanek and Jeffrey Horner (2015). Cairo: R graphics device using cairo graphics library for creating high-quality bitmap (PNG, JPEG, TIFF), vector (PDF, SVG, PostScript) and display (X11 and Win32) output. R package version 1.5-9. https://CRAN.R-project.org/package=Cairo

R Core Team (2017). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/

Duncan Temple Lang and the CRAN Team (2017). XML: Tools for Parsing and Generating XML Within R and S-Plus. R package version 3.98-1.9. https://CRAN.R-project.org/package=XML

Carson Sievert, Chris Parmer, Toby Hocking, Scott Chamberlain, Karthik Ram, Marianne Corvellec and Pedro Despouy (NA). plotly: Create Interactive Web Graphics via 'plotly.js'. https://plot.ly/r, https://cpsievert.github.io/plotly_book/, https://github.com/ropensci/plotly

Matt Dowle and Arun Srinivasan (2017). data.table: Extension of 'data.frame'. R package version 1.10.4. https://CRAN.R-project.org/package=data.table

R. Gentleman, V. Carey, W. Huber and F. Hahne (2017). genefilter: genefilter: methods for filtering genes from high-throughput experiments. R package version 1.58.1.

Peter Langfelder, Steve Horvath (2012). Fast R Functions for Robust Correlations and Hierarchical Clustering. Journal of Statistical Software, 46(11), 1-17. URL http://www.jstatsoft.org/v46/i11/

Almende B.V., Benoit Thieurmel and Titouan Robert (2017). visNetwork: Network Visualization using 'vis.js' Library. R package version 2.0.1. https://CRAN.R-project.org/package=visNetwork Vacher, Claire-Marie, Helene Lacaille, Jiaqi J. O’Reilly, Jacquelyn Salzbank, Dana Bakalar, Sonia Sebaoui, Philippe Liere, et al. 2021. “Placental Endocrine Function Shapes Cerebellar Development and Social Behavior.” Nature Neuroscience 24 (10): 1392–1401. Ortiz, Cantin, Jose Fernandez Navarro, Aleksandra Jurek, Antje Märtin, Joakim Lundeberg, and Konstantinos Meletis. 2020. “Molecular Atlas of the Adult Mouse Brain.” Science Advances 6 (26): eabb3446.

Amezquita R, Lun A, Becht E, Carey V, Carpp L, Geistlinger L, Marini F, Rue-Albrecht K, Risso D, Soneson C, Waldron L, Pages H, Smith M, Huber W, Morgan M, Gottardo R, Hicks S (2020). “Orchestrating single-cell analysis with Bioconductor.” Nature Methods, 17, 137–145. https://www.nature.com/articles/s41592-019-0654-x Morgan M, Obenchain V, Hester J, Pagès H (2021). SummarizedExperiment: SummarizedExperiment container. R package version 1.24.0, https://bioconductor.org/packages/SummarizedExperiment.

Martin Morgan, Valerie Obenchain, Jim Hester and Hervé Pagès (2021). SummarizedExperiment: SummarizedExperiment container. R package version 1.24.0. https://bioconductor.org/packages/SummarizedExperiment Amezquita R, Lun A, Becht E, Carey V, Carpp L, Geistlinger L, Marini F, Rue-Albrecht K, Risso D, Soneson C, Waldron L, Pages H, Smith M, Huber W, Morgan M, Gottardo R, Hicks S (2020). "Orchestrating single-cell analysis with Bioconductor." _Nature Methods_, *17*, 137-145. <URL: https://www.nature.com/articles/s41592-019-0654-x> Douglas Bates and Martin Maechler (2021). Matrix: Sparse and Dense Matrix Classes and Methods. R package version 1.4-0. https://CRAN.R-project.org/package=Matrix Vacher CM, Lacaille H, O'Reilly JJ, Salzbank J et al. Placental endocrine function shapes cerebellar development and social behavior. Nat Neurosci 2021 Oct;24(10):1392-1401. PMID: 34400844. Ortiz C, Navarro JF, Jurek A, Märtin A et al. Molecular atlas of the adult mouse brain. Sci Adv 2020 Jun;6(26):eabb3446. PMID: 32637622

Amezquita R, Lun A, Becht E, Carey V, Carpp L, Geistlinger L, Marini F, Rue-Albrecht K, Risso D, Soneson C, Waldron L, Pages H, Smith M, Huber W, Morgan M, Gottardo R, Hicks S (2020). “Orchestrating single-cell analysis with Bioconductor.” Nature Methods, 17, 137–145. https://www.nature.com/articles/s41592-019-0654-x Morgan M, Obenchain V, Hester J, Pagès H (2021). SummarizedExperiment: SummarizedExperiment container. R package version 1.24.0, https://bioconductor.org/packages/SummarizedExperiment.

Examples

# To obtain reproducible results, a fixed seed is set for generating random numbers.
set.seed(10); library(SummarizedExperiment)
# Example bulk data of mouse brain for coclustering (Vacher et al 2021).
blk.mus.pa <- system.file("extdata/shinyApp/data", "bulk_mouse_cocluster.rds", 
package="spatialHeatmap") 
blk.mus <- readRDS(blk.mus.pa)
assay(blk.mus)[1:3, 1:5]

# Example single cell data for coclustering (Ortiz et al 2020).
sc.mus.pa <- system.file("extdata/shinyApp/data", "cell_mouse_cocluster.rds", 
package="spatialHeatmap") 
sc.mus <- readRDS(sc.mus.pa)
colData(sc.mus)[1:3, , drop=FALSE]


# Normalization: bulk and single cell are combined and normalized, then separated.
mus.lis.nor <- norm_cell(sce=sc.mus, bulk=blk.mus, com=FALSE)

# Aggregate bulk replicates.
blk.mus.aggr <- aggr_rep(data=mus.lis.nor$bulk, assay.na='logcounts', sam.factor='sample', 
aggr='mean')
# Filter bulk
blk.mus.fil <- filter_data(data=blk.mus.aggr, pOA=c(0.1, 1), CV=c(0.1, 50), verbose=FALSE) 
# Filter cell and subset bulk to genes in cell
blk.sc.mus.fil <- filter_cell(sce=mus.lis.nor$cell, bulk=blk.mus.fil, cutoff=1, p.in.cell=0.1,
p.in.gen=0.01, verbose=FALSE) 
# Co-cluster bulk and single cells.
coclus.mus <- cocluster(bulk=blk.sc.mus.fil$bulk, cell=blk.sc.mus.fil$cell, min.dim=12, 
dimred='PCA', graph.meth='knn', cluster='wt')
# Co-clustering results. The 'cluster' indicates cluster labels, the 'bulkCell' indicates bulk
# tissues or single cells, the 'sample' suggests original labels of bulk and cells, the 
# 'assignedBulk' refers to bulk tissues assigned to cells with none suggesting un-assigned, 
# and the 'similarity' refers to Spearman's correlation coefficients for assignments between 
# bulk and cells, which is a measure of assignment strigency.
colData(coclus.mus)

# Filter bulk-cell assignments according a similarity cutoff (min.sim).
coclus.mus <- filter_asg(coclus.mus, min.sim=0.1)

# Tailor bulk-cell assignments in R.
plot_dim(coclus.mus, dim='UMAP', color.by='sample', x.break=seq(-10, 10, 1), 
y.break=seq(-10, 10, 1), panel.grid=TRUE)
# Define desired bulk tissues for selected cells.
df.desired.bulk <- data.frame(x.min=c(-8), x.max=c(-3.5), y.min=c(-2.5), y.max=c(0.5), 
desiredBulk=c('hippocampus'), dimred='UMAP')
df.desired.bulk
# Tailor bulk-cell assignments.
coclus.mus.tailor <- refine_asg(sce.all=coclus.mus, df.desired.bulk=df.desired.bulk)

# Define desired bulk tissues for selected cells on a Shiny app.
# Save "coclus.mus" using "saveRDS" then upload the saved ".rds" file to the Shiny app.
saveRDS(coclus.mus, file='coclus.mus.rds')

# Start the Shiny app.
desired_bulk_shiny()


jianhaizhang/spatialHeatmap documentation built on April 21, 2024, 7:43 a.m.