FSCseq_workflow: Minimal workflow for FSCseq

Description Usage Arguments Value Author(s) References Examples

View source: R/FSCseq_workflow.R

Description

Full FSCseq workflow based on minimal working defaults

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
FSCseq_workflow(
  cts,
  ncores = 1,
  batch = NULL,
  X = NULL,
  true_cls = NULL,
  true_disc = NULL,
  method = "CEM",
  n_rinits = 1,
  med_filt = 500,
  MAD_filt = 50,
  K_search = c(2:6),
  lambda_search = seq(0.25, 5, 0.25),
  alpha_search = c(0.01, seq(0.05, 0.5, 0.05)),
  OS_save = T,
  tune_save = F,
  trace = F,
  trace.prefix = "",
  nMB = 5,
  dir_name = "Saved_Results",
  coding = "reference",
  cleanup = T
)

Arguments

cts

integer matrix, count matrix of dimension g by n. Must be integers (counts)

ncores

integer, number of cores (for parallel computing). Default is 1

batch

vector of batch, to use as covariates. Default is one batch (NULL).

X

optional input design matrix to specify p arbitrary covariates/confounders. Must be matrix of dimension n x p. If batch and X are both specified, then X is augmented to incorporate batch as covariates.

true_cls

(optional) integer vector of true groups, if available, for diagnostic tracking.

true_disc

(optional) logical vector of true discriminatory genes, if available, for diagnostic tracking.

method

string, either "EM" or "CEM". Default is "CEM"

n_rinits

integer, number of additional random initializations (on top of Hierarchical and K-means) to be searched. Default is 1

med_filt

integer, threshold for minimum median gene normalized count for pre-filtering. med_filt=0 pre-filters no genes via this criterion. Default is 500.

MAD_filt

integer, value between 0 and 100. quantile threshold for gene log MAD of normalized count. MAD_filt=0 pre-filters no genes via this criterion. Default is 50.

K_search

integer vector, values of K (number of clusters) to be searched. Default is 2:6

lambda_search

numeric vector, values of lambda to be searched. Default is seq(0.25,3,0.25)

alpha_search

numeric vector, values of alpha to be searched. Default is c(0.01,seq(0.05,0.50,0.05))

OS_save

logical, TRUE: saves progress of computationally costly warm starts (multiple initializations). Default is TRUE

tune_save

logical, TRUE: saves progress of penalty parameter searches. This may save many files, depending on the grid of values searched for lambda and alpha. Default is FALSE

trace

logical, TRUE: output diagnostic messages, FALSE (default): don't output

trace.prefix

(optional) string, prefix of file name to store trace output.

nMB

integer, number of minibatches to use in M step. Default is 5

dir_name

string, name of directory specified for saved results (if OS_save = TRUE) and diagnostics (if trace = TRUE)

coding

string, "reference" or "cellmeans" coding for batch. Doesn't matter if batch effects are not adjusted.

cleanup

logical, if OS_save=TRUE or tune_save=TRUE, remove all saved files after convergence.

Value

list with K, cls, discriminatory, and fit

Author(s)

David K. Lim, deelim@live.unc.edu

References

https://github.com/DavidKLim/FSCseq

Examples

1
2
3
sim.dat = FSCseq::simulateData(B=1, g=10000, K=2, n=50, LFCg=1, pDEg=0.05, beta0=12, phi0=0.35, nsims=1, save_file=F)[[1]]
## Not run: FSCseq_results = FSCseq_workflow(cts=sim.dat$cts, K_search=c(2:3), lambda_search=c(1.0, 1.5), alpha_search=c(0.1, 0.2))
## Not run: summary(FSCseq_workflow$results)

DavidKLim/FSCseq documentation built on Dec. 12, 2021, 3:46 a.m.