FSCseq_workflow: Minimal workflow for FSCseq
In DavidKLim/FSCseq: Feature Selection and Clustering of RNA-seq Count Data

Description Usage Arguments Value Author(s) References Examples

Full FSCseq workflow based on minimal working defaults

FSCseq_workflow(
  cts,
  ncores = 1,
  batch = NULL,
  X = NULL,
  true_cls = NULL,
  true_disc = NULL,
  method = "CEM",
  n_rinits = 1,
  med_filt = 500,
  MAD_filt = 50,
  K_search = c(2:6),
  lambda_search = seq(0.25, 5, 0.25),
  alpha_search = c(0.01, seq(0.05, 0.5, 0.05)),
  OS_save = T,
  tune_save = F,
  trace = F,
  trace.prefix = "",
  nMB = 5,
  dir_name = "Saved_Results",
  coding = "reference",
  cleanup = T
)

`cts`	integer matrix, count matrix of dimension g by n. Must be integers (counts)
`ncores`	integer, number of cores (for parallel computing). Default is 1
`batch`	vector of batch, to use as covariates. Default is one batch (NULL).
`X`	optional input design matrix to specify p arbitrary covariates/confounders. Must be matrix of dimension n x p. If batch and X are both specified, then X is augmented to incorporate batch as covariates.
`true_cls`	(optional) integer vector of true groups, if available, for diagnostic tracking.
`true_disc`	(optional) logical vector of true discriminatory genes, if available, for diagnostic tracking.
`method`	string, either "EM" or "CEM". Default is "CEM"
`n_rinits`	integer, number of additional random initializations (on top of Hierarchical and K-means) to be searched. Default is 1
`med_filt`	integer, threshold for minimum median gene normalized count for pre-filtering. med_filt=0 pre-filters no genes via this criterion. Default is 500.
`MAD_filt`	integer, value between 0 and 100. quantile threshold for gene log MAD of normalized count. MAD_filt=0 pre-filters no genes via this criterion. Default is 50.
`K_search`	integer vector, values of K (number of clusters) to be searched. Default is 2:6
`lambda_search`	numeric vector, values of lambda to be searched. Default is seq(0.25,3,0.25)
`alpha_search`	numeric vector, values of alpha to be searched. Default is c(0.01,seq(0.05,0.50,0.05))
`OS_save`	logical, TRUE: saves progress of computationally costly warm starts (multiple initializations). Default is TRUE
`tune_save`	logical, TRUE: saves progress of penalty parameter searches. This may save many files, depending on the grid of values searched for lambda and alpha. Default is FALSE
`trace`	logical, TRUE: output diagnostic messages, FALSE (default): don't output
`trace.prefix`	(optional) string, prefix of file name to store trace output.
`nMB`	integer, number of minibatches to use in M step. Default is 5
`dir_name`	string, name of directory specified for saved results (if OS_save = TRUE) and diagnostics (if trace = TRUE)
`coding`	string, "reference" or "cellmeans" coding for batch. Doesn't matter if batch effects are not adjusted.
`cleanup`	logical, if OS_save=TRUE or tune_save=TRUE, remove all saved files after convergence.

list with K, cls, discriminatory, and fit

David K. Lim, deelim@live.unc.edu

https://github.com/DavidKLim/FSCseq

1
2
3

sim.dat = FSCseq::simulateData(B=1, g=10000, K=2, n=50, LFCg=1, pDEg=0.05, beta0=12, phi0=0.35, nsims=1, save_file=F)[[1]]
## Not run: FSCseq_results = FSCseq_workflow(cts=sim.dat$cts, K_search=c(2:3), lambda_search=c(1.0, 1.5), alpha_search=c(0.1, 0.2))
## Not run: summary(FSCseq_workflow$results)