FSCseq: Feature Selection and Clustering of RNA-seq Count Data

Description Usage Arguments Value Author(s) References

Performs clustering, feature selection, and estimation of parameters using a finite mixture model of negative binomials

EM_run(
  ncores,
  X = NA,
  y,
  k,
  lambda = 0,
  alpha = 0,
  size_factors = rep(1, times = ncol(y)),
  norm_y = y,
  true_clusters = NA,
  true_disc = NA,
  init_parms = FALSE,
  init_coefs = matrix(0, nrow = nrow(y), ncol = k),
  init_phi = matrix(0, nrow = nrow(y), ncol = k),
  init_cls = NULL,
  init_wts = NULL,
  CEM = T,
  init_Tau = nrow(y),
  maxit_EM = 100,
  maxit_IRLS = 50,
  maxit_CDA = 50,
  EM_tol = 1e-06,
  IRLS_tol = 1e-04,
  CDA_tol = 1e-04,
  disp,
  trace = F,
  mb_size = NULL,
  PP_filt
)

`ncores`	integer, number of cores to utilize in parallel computing (default 1)
`X`	design matrix of dimension n by p
`y`	count matrix of dimension g by n
`k`	integer, number of clusters
`lambda`	numeric penalty parameter, lambda >= 0
`alpha`	numeric penalty parameters, 0 <= alpha < 1
`size_factors`	numeric vector of length n, factors to correct for subject-specific variation of sequencing depth
`norm_y`	count matrix of dimension g by n, normalized for differences in sequencing depth
`true_clusters`	(optional) integer vector of true groups, if available, for diagnostic tracking
`true_disc`	(optional) logical vector of true discriminatory genes, if available, for diagnostic tracking
`init_parms`	logical, TRUE: custom parameter initializations, FALSE (default): start from scratch
`init_coefs`	matrix of dimension g by k, only if init_parms = TRUE
`init_phi`	vector of dimension g (gene-specific dispersions) or matrix of dimension g by k (cluster-specific dispersions), only if init_parms = TRUE
`init_cls`	vector of length n, initial clustering.
`init_wts`	matrix of dim k x n: denotes cluster memberships, but can have partial membership. init_wts or init_cls must be initialized
`CEM`	logical, TRUE for CEM (default), FALSE for EM
`init_Tau`	numeric, initial temperature for CEM. Default is g for CEM (set to 1 for EM)
`maxit_EM`	integer, maximum number of iterations for full CEM/EM run (default 100)
`maxit_IRLS`	integer, maximum number of iterations for IRLS loop, in M step (default 50)
`maxit_CDA`	integer, maximum number of iterations for CDA loop (default is 50)
`EM_tol`	numeric, tolerance of convergence for EM/CEM, default is 1E-6
`IRLS_tol`	numeric, tolerance of convergence for IRLS, default is 1E-4
`CDA_tol`	numeric, tolerance of convergence for CDA, default is 1E-4
`disp`	string, either "gene" (default) or "cluster"
`trace`	logical, TRUE: output diagnostic messages, FALSE (default): don't output
`mb_size`	minibatch size: # of genes to include per M step iteration
`PP_filt`	numeric between (0,1), threshold on PP for sample/cl to be included in M step estimation. Default is 1e-3