eLNNpairedCovSEM: Model-Based Clustering for Paired Data Adjusting for...

View source: R/eLNNpairedCovSEM.R

eLNNpairedCovSEMR Documentation

Model-Based Clustering for Paired Data Adjusting for Covariates Using Simulated Annealing Modified EM

Description

Model-based clustering based on extended log-normal normal model for paired data adjusting for covariates.

Usage

eLNNpairedCovSEM(
  EsetDiff,
  fmla = ~Age + Sex,
  probeID.var = "probeid",
  gene.var = "gene",
  chr.var = "chr",
  scaleFlag = TRUE,
  Maxiter =10,
  maxIT = 10,
  b=c(2,2,2),
  converge_threshold = 1e-3,
  optimMethod = "L-BFGS-B",
  bound.alpha = c(0.001, 6),
  bound.beta = c(0.001, 6),
  bound.k = c(0.001, 0.9999),
  bound.eta = c(-10, 10),
  mc.cores = 1,
  temp0 = 2, 
  r_cool=0.9,
  verbose=FALSE)

Arguments

EsetDiff

An ExpressionSet object storing the log2 difference between post-treatment and pre-treatment.

fmla

A formula without outcome variable.

probeID.var

character. Indicates the probe id.

gene.var

character. Indicates the gene symbol.

chr.var

character. Indicates the chromosome.

scaleFlag

logical. Indicating if rows (probes) need to be scaled (but not centered).

Maxiter

integer. The max allowed number of iterations for EM algorithm. Default value is maxRT = 100.

maxIT

integer. The max allowed number of iterations in R built-in function optim. Default value is maxIT = 100. maxIT.

b

numeric. A vector of concentration parameters used in Dirichlet distribution. Default value is b = c(2,2,2).

converge_threshold

numeric. One of the two termination criteria of iteration. The smaller this value is set, the harder the optimization procedure in eLNNpaired will be considered to be converged. Default value is converge_threshold = 1e-6.

optimMethod

character. Indicates the method for optimization. optim.

bound.alpha

numeric. A vector of 2 positive numbers to specify lower and upper bound of estimate of \alpha_c, c="OE", "UE", or "NE".

bound.beta

numeric. A vector of 2 positive numbers to specify lower and upper bound of estimate of \beta_c, c="OE", "UE", or "NE".

bound.k

numeric. A vector of 2 positive numbers to specify lower and upper bound of estimate of k_c, c="OE", "UE", or "NE".

bound.eta

numeric. A vector of p+1 positive numbers to specify lower and upper bound of estimate of \eta_c, c="OE", "UE", or "NE", where p is the number of covariates.

mc.cores

integer. A positive integer specifying number of computer cores to be used by parallel computing.

temp0

numeric. Initial temperature in simulated-annealing modified EM.

r_cool

numeric. Cooling rate in simulated-annealing modified EM, which is inside interval (0, 1).

verbose

logic. An indicator variable telling if print out intermediate results: FALSE for not printing out, TRUE for printing out. Default value is verbose = False.

Details

A gene will be assigned to cluster “NE” if its posterior probability for non-differentially expressed gene cluster is the largest. A gene will be assigned to cluster “OE” if its posterior probability for over-expressed gene cluster is the largest. A gene will be assigned to cluster “UE” if its responsibility for under-expressed gene cluster is the largest.

Value

A list of 9 elementes:

par.ini

initial estimate of parameter

par.final

A vector of the estimated model parameters in original scale.

memGenes

probe cluster membership based on eLNNpairedCovSEM algorithm.

memGenes2

probe cluster membership based on eLNNpairedCovSEM algorithm. 2-categories: "DE" indicates differentially expressed; "NE" indicates non-differentially expressed.

memGenes.limma

probe cluster membership based on limma.

res.ini

results of limma analysis

update_info

object returned by optim function

wmat

matrix of responsibilities

iter.EM

number of EM iterations.

tempFinal

final temperature in simulated-annealing modification EM

Author(s)

Yixin Zhang zhyl133@gmail.com, Wei Liu liuwei@mathstat.yorku.ca, Weiliang Qiu weiliang.qiu@sanofi.com

References

Zhang Y, Liu W, Qiu W. A model-based clustering via mixture of hierarchical models with covariate adjustment for detecting differentially expressed genes from paired design. BMC Bioinformatics 24, 423 (2023)

Examples

data(esDiff)

res.SEM = eLNNpairedCovSEM(EsetDiff = esDiff, 
		    fmla = ~Age + Sex, 
		    probeID.var = "probeid", 
		    gene.var = "gene", 
		    chr.var = "chr",
		    scaleFlag = FALSE,
		    mc.cores = 1,
        verbose = TRUE)

# true probe cluster membership
memGenes.true = fData(esDiff)$memGenes.true
print(table(memGenes.true))

# probe cluster membership
memGenes.limma = res.SEM$memGenes.limma
print(table(memGenes.limma))

# final probe cluster membership
memGenes.SEM = res.SEM$memGenes
print(table(memGenes.SEM))

# cross tables
print(table(memGenes.true, memGenes.limma))
print(table(memGenes.true, memGenes.SEM))

# accuracies
print(mean(memGenes.true == memGenes.limma))
print(mean(memGenes.true == memGenes.SEM))


eLNNpairedCov documentation built on May 29, 2024, 3:16 a.m.