coclus_opt: Optimization of co-clustering bulk and single cell data

View source: R/coclus_opt.R

coclus_optR Documentation

Optimization of co-clustering bulk and single cell data

Description

This function is specialized in optimizing the co-clustering method that is able to automatically assign bulk tissues to single cells. A vignette is provide at https://jianhaizhang.github.io/spatialHeatmap_supplement/cocluster_optimize.html.

Usage

coclus_opt(
  dat.lis,
  df.para,
  df.fil.set,
  batch.par = NULL,
  multi.core.par = NULL,
  wk.dir,
  verbose = TRUE
)

Arguments

dat.lis

A two-level nested list. Each inner list consists of three slots of bulk, cell, and df.match, corresponding to bulk data, single cell data, and ground-truth matching between bulk and cells respectively. For example, list(dataset1=list(bulk=bulk.data1, cell=cell.data1, df.match=df.match1), dataset2=list(bulk=bulk.data2, cell=cell.data2, df.match=df.match2)).

df.para

A data.frame with each row corresponding to a combination of parameter settings in co-clustering.

df.fil.set

A data.frame of filtering settings. E.g. data.frame(p=c(0.1, 0.2), A=rep(1, 2), cv1=c(0.1, 0.2), cv2=rep(50, 2), cutoff=rep(1, 2), p.in.cell=c(0.15, 0.2), p.in.gen=c(0.05, 0.1), row.names=paste0('fil', seq_len(2))).

batch.par

The parameters for first-level parallelization through a cluster scheduler such as SLURM, which is BatchtoolsParam. If NULL (default), the first-level parallelization is skipped.

multi.core.par

The parameters for second-level parallelization, which is MulticoreParam.

wk.dir

The working directory, where results will be saved.

verbose

If TRUE, intermediate messages will be printed.

Value

A data.frame.

Author(s)

Jianhai Zhang jzhan067@ucr.edu
Dr. Thomas Girke thomas.girke@ucr.edu

References

Morgan M, Wang J, Obenchain V, Lang M, Thompson R, Turaga N (2022). _BiocParallel: Bioconductor facilities for parallel evaluation_. R package version 1.30.3, <https://github.com/Bioconductor/BiocParallel>. Li, Song, Masashi Yamada, Xinwei Han, Uwe Ohler, and Philip N Benfey. 2016. "High-Resolution Expression Map of the Arabidopsis Root Reveals Alternative Splicing and lincRNA Regulation." Dev. Cell 39 (4): 508–22 Shahan, Rachel, Che-Wei Hsu, Trevor M Nolan, Benjamin J Cole, Isaiah W Taylor, Anna Hendrika Cornelia Vlot, Philip N Benfey, and Uwe Ohler. 2020. "A Single Cell Arabidopsis Root Atlas Reveals Developmental Trajectories in Wild Type and Cell Identity Mutants." BioRxiv.

Examples


# Optimization includes many iterative runs of co-clustering. To reduce runtime, these runs 
# are parallelized with the package BiocParallel. 
library(BiocParallel)
# To obtain reproducible results, a fixed seed is set for generating random numbers.
set.seed(10)

# Read bulk (S. Li et al. 2016) and two single cell data sets (Shahan et al. 2020), all of
# which are from Arabidopsis root.
blk <- readRDS(system.file("extdata/cocluster/data", "bulk_cocluster.rds", 
package="spatialHeatmap")) # Bulk.
sc10 <- readRDS(system.file("extdata/cocluster/data", "sc10_cocluster.rds", 
package="spatialHeatmap")) # Single cell.
sc11 <- readRDS(system.file("extdata/cocluster/data", "sc11_cocluster.rds", 
package="spatialHeatmap")) # Single cell.
blk; sc10; sc11

# The ground-truth matching between bulk tissue and single cells needs to be defined in form 
# of a table so as to classify TRUE/FALSE assignments.
match.pa <- system.file("extdata/cocluster/data", "true_match_arab_root_cocluster.txt", 
package="spatialHeatmap")
df.match.arab <- read.table(match.pa, header=TRUE, row.names=1, sep='\t')
df.match.arab[1:3, ]

# Place the bulk, single cell data, and matching table in a list.
dat.lis <- list(
  dataset1=list(bulk=blk, cell=sc10, df.match=df.match.arab), 
  dataset1=list(bulk=blk, cell=sc11, df.match=df.match.arab) 
)

# Filtering settings. 
df.fil.set <- data.frame(p=c(0.1), A=rep(1, 1), cv1=c(0.1), cv2=rep(50, 1), cutoff=rep(1, 1),
p.in.cell=c(0.15), p.in.gen=c(0.05), row.names=paste0('fil', seq_len(1))) 
# Settings in pre-processing include normalization method (norm), filtering (fil). The 
# following optimization focuses on settings most relevant to co-clustering, including 
# dimension reduction methods (dimred), number of top dimensions for co-clustering (dims), 
# graph-building methods (graph), clustering methods (cluster). Explanations of these settings
# are provide in the help file of function "cocluster".  
norm <- c('FCT'); fil <- c('fil1'); dimred <- c('UMAP')
dims <- seq(5, 10, 1); graph <- c('knn', 'snn')
cluster <- c('wt', 'fg', 'le')

df.para <- expand.grid(dataset=names(dat.lis), norm=norm, fil=fil, dimred=dimred, dims=dims, 
graph=graph, cluster=cluster, stringsAsFactors = FALSE)


# Optimization is performed by calling "coclus_opt", and results to a temporary directory 
# "wk.dir".
wk.dir <- normalizePath(tempdir(check=TRUE), winslash="/", mustWork=FALSE)
df.res <- coclus_opt(dat.lis, df.para, df.fil.set, multi.core.par=MulticoreParam(workers=1, 
RNGseed=50), wk.dir=wk.dir, verbose=TRUE)
df.res[1:3, ]


jianhaizhang/spatialHeatmap documentation built on July 31, 2024, 2:59 a.m.