EM_run: EM/CEM run for FSCseq

Description Usage Arguments Value Author(s) References

View source: R/FSCseq.R

Description

Performs clustering, feature selection, and estimation of parameters using a finite mixture model of negative binomials

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
EM_run(
  ncores,
  X = NA,
  y,
  k,
  lambda = 0,
  alpha = 0,
  size_factors = rep(1, times = ncol(y)),
  norm_y = y,
  true_clusters = NA,
  true_disc = NA,
  init_parms = FALSE,
  init_coefs = matrix(0, nrow = nrow(y), ncol = k),
  init_phi = matrix(0, nrow = nrow(y), ncol = k),
  init_cls = NULL,
  init_wts = NULL,
  CEM = T,
  init_Tau = nrow(y),
  maxit_EM = 100,
  maxit_IRLS = 50,
  maxit_CDA = 50,
  EM_tol = 1e-06,
  IRLS_tol = 1e-04,
  CDA_tol = 1e-04,
  disp,
  trace = F,
  mb_size = NULL,
  PP_filt
)

Arguments

ncores

integer, number of cores to utilize in parallel computing (default 1)

X

design matrix of dimension n by p

y

count matrix of dimension g by n

k

integer, number of clusters

lambda

numeric penalty parameter, lambda >= 0

alpha

numeric penalty parameters, 0 <= alpha < 1

size_factors

numeric vector of length n, factors to correct for subject-specific variation of sequencing depth

norm_y

count matrix of dimension g by n, normalized for differences in sequencing depth

true_clusters

(optional) integer vector of true groups, if available, for diagnostic tracking

true_disc

(optional) logical vector of true discriminatory genes, if available, for diagnostic tracking

init_parms

logical, TRUE: custom parameter initializations, FALSE (default): start from scratch

init_coefs

matrix of dimension g by k, only if init_parms = TRUE

init_phi

vector of dimension g (gene-specific dispersions) or matrix of dimension g by k (cluster-specific dispersions), only if init_parms = TRUE

init_cls

vector of length n, initial clustering.

init_wts

matrix of dim k x n: denotes cluster memberships, but can have partial membership. init_wts or init_cls must be initialized

CEM

logical, TRUE for CEM (default), FALSE for EM

init_Tau

numeric, initial temperature for CEM. Default is g for CEM (set to 1 for EM)

maxit_EM

integer, maximum number of iterations for full CEM/EM run (default 100)

maxit_IRLS

integer, maximum number of iterations for IRLS loop, in M step (default 50)

maxit_CDA

integer, maximum number of iterations for CDA loop (default is 50)

EM_tol

numeric, tolerance of convergence for EM/CEM, default is 1E-6

IRLS_tol

numeric, tolerance of convergence for IRLS, default is 1E-4

CDA_tol

numeric, tolerance of convergence for CDA, default is 1E-4

disp

string, either "gene" (default) or "cluster"

trace

logical, TRUE: output diagnostic messages, FALSE (default): don't output

mb_size

minibatch size: # of genes to include per M step iteration

PP_filt

numeric between (0,1), threshold on PP for sample/cl to be included in M step estimation. Default is 1e-3

Value

FSCseq object with clustering results, posterior probabilities of cluster membership, and cluster-discriminatory status of each gene

Author(s)

David K. Lim, deelim@live.unc.edu

References

https://github.com/DavidKLim/FSCseq


DavidKLim/FSCseq documentation built on Dec. 12, 2021, 3:46 a.m.