benchmark: main function of the deconvolution benchmark

View source: R/benchmark.R

benchmarkR Documentation

main function of the deconvolution benchmark

Description

main function of the deconvolution benchmark

Usage

benchmark(
  sc.counts,
  sc.pheno,
  bulk.counts,
  bulk.props,
  benchmark.name,
  grouping,
  cell.type.column = "cell_type",
  patient.column = "patient",
  sample.name.column = "sample.name",
  input.algorithms = NULL,
  simulation.bulks = FALSE,
  simulation.genes = FALSE,
  simulation.samples = FALSE,
  simulation.subtypes = FALSE,
  genesets = NULL,
  repeats = 5,
  temp.dir = NULL,
  exclude.from.bulks = NULL,
  exclude.from.signature = NULL,
  n.bulks = 500,
  cpm = FALSE,
  verbose = FALSE,
  n.cluster.sizes = c(1, 2, 4, 8),
  n.profiles.per.bulk = 1000,
  report = TRUE
)

Arguments

sc.counts

non-negative numeric matrix with features as rows, and scRNA-Seq profiles as columns. ncol(sc.counts) must equal nrow(sc.pheno). May also be sparse matrix (class 'dgCMatrix')

sc.pheno

data frame with scRNA-Seq profiles as rows, and pheno entries in columns. nrow(sc.pheno) must equal ncol(sc.counts). Cell types need to be specified in column 'cell.type.column', the patient/origin (if available) in column 'patient.column' and the sample names in column 'sample.name.column'

bulk.counts

non-negative numeric matrix, with features as rows, and bulk RNA-Seq profiles as columns. ncol(sc.counts) must equal nrow(bulk.props). May also be sparse matrix (class 'dgCMatrix')

bulk.props

non-negative numeric matrix specifying the amount of each cell type in all each bulk, with cell types as rows and bulk RNA-Seq profiles as columns.

benchmark.name

string, name of the benchmark. Will be used as name for the results directory

grouping

factor with 2 levels, and length(grouping) must be ncol(sc.counts). Assigns each scRNA-Seq profile to either test or train cohort. 1 marks training samples, 2 marks test samples.

cell.type.column

string, which column of 'sc.pheno' holds the cell type information? default 'cell_type'

patient.column

string, which column of 'sc.pheno' holds the patient information; optional, default 'patient'

sample.name.column

string, which column of 'sc.pheno' holds the sample name information; optional, default 'sample.name'

input.algorithms

list containing a list for each algorithm. Each sublist contains
1) name: character
2) algorithm: function
3) model: model to be supplied to the algorithm, optional
For predefined algorithms it is sufficient to supply only the name instead of the sublist, e.g. algorithms = list(list(name = 'DTD', algorithm = run_dtd), "MuSiC").
If no list is supplied (default), all implemented algorithms (CIBERSORT, DeconRNASeq, DTD, Least_Squares, BSEQ-sc and MuSiC) are selected.

simulation.bulks

boolean, should deconvolution of simulated bulks be performed? default: FALSE

simulation.genes

boolean, should deconvolution of simulated bulks with predefined genesets be performed? default: FALSE

simulation.samples

boolean, should deconvolution of simulated bulks with varying number of randomly selected training profiles be performed? default: FALSE

simulation.subtypes

boolean, should deconvolution of simulated bulks with artificial subtypes of given cell types be performed? default: FALSE

genesets

named list of string vectors, each must match subset of 'rownames(sc.counts)'. default: NULL

repeats

numeric > 0, number of repetitions for each algorithm in each setting. default: 5

temp.dir

string, directory where data, and benchmarks get stored. default: NULL, using directory '.tmp' in working directory

exclude.from.bulks

vector of strings, cell types that should not be included in the simulated bulks. default: NULL

exclude.from.signature

vector of strings, cell types that should not be predicted by the algorithms. default: NULL

n.bulks

numeric > 0, number of bulks to simulate. default 500

cpm

boolean, should the sc profiles and bulks be scaled to counts per million? default: FALSE

verbose

boolean, should progress information be printed to the screen? default: FALSE

n.cluster.sizes

vector of integers, number of artificial subtypes to generate per cell type; default: c(1, 2, 4, 8)

n.profiles.per.bulk

positive numeric, number of samples to be randomly, default: 1000

report

boolean, should an HTML report be generated? deafult TRUE

Value

list of
1) report_path: report path (string), NULL if no report is generated
2) bulk_results: deconvolution results for real bulks, NULL if no real bulks were supplied


MarianSchoen/DMC documentation built on Aug. 2, 2022, 3:05 p.m.