simplica: SIMPLICA: Simultaneous Identification of Simplivariate...

View source: R/simplica.R

simplicaR Documentation

SIMPLICA: Simultaneous Identification of Simplivariate Components

Description

Implements the SIMPLICA algorithm to identify Simplivariate Components in data matrices using a genetic algorithm. These components are related to clusters or biclusters, but defined here in terms of specific structural patterns (constant, additive, multiplicative, or user-defined).

Usage

simplica(
  df,
  maxIter = 2000,
  popSize = 300,
  pCrossover = 0.6,
  pMutation = 0.03,
  zeroFraction = 0.9,
  elitism = 100,
  numSimComp = 5,
  verbose = FALSE,
  mySeeds = 1:5,
  interval = 100,
  penalty = c(constant = 0, additive = 1, multiplicative = 0),
  patternFunctions = defaultPatternFunctions(),
  doSimplicaCV = TRUE,
  cvControl = NULL
)

Arguments

df

A numeric data matrix to analyze

maxIter

Maximum number of generations for the genetic algorithm (default: 2000)

popSize

Population size for the genetic algorithm (default: 300)

pCrossover

Crossover probability for genetic algorithm (default: 0.6)

pMutation

Mutation probability for genetic algorithm (default: 0.03)

zeroFraction

Fraction of population initialized with zeros (default: 0.9)

elitism

Number of best individuals preserved between generations (default: 100)

numSimComp

Number of Simplivariate Components simultaneously optimized (default: 5)

verbose

Logical, whether to print SIMPLICA progress information (default: FALSE)

mySeeds

Vector of random seeds for replicate runs (default: 1:5)

interval

Interval for monitoring GA progress (default: 100)

penalty

Named vector of penalty values for each pattern type (default: c(constant = 0, additive = 1, multiplicative = 0))

patternFunctions

List of pattern functions used for fitness evaluation (default: defaultPatternFunctions())

doSimplicaCV

Logical, run cross-validated relabeling with simplicaCV() after GA (default: TRUE)

cvControl

Optional list to tune simplicaCV; fields passed to simplicaCV via do.call. Defaults if omitted:

  • patternFitters = defaultPatternFitters()

  • preferenceOrder = names(patternFunctions)

  • nRepeats = 40

  • testFraction = 0.2

  • minCellsForModels = 25

  • parsimonyMargin = 0.05

  • requireFitters = TRUE

  • updateObject = TRUE

  • verbose = verbose

Value

A list with:

  • best: simplica object (includes original GA result; if doSimplicaCV=TRUE, also componentPatternsUpdated and componentAudit)

  • raw: list of "ga" objects (one per seed, from the GA package)

References

Hageman, J. A., Wehrens, R., & Buydens, L. M. C. (2008). "Simplivariate Models: Ideas and First Examples." PLoS ONE, 3(9), e3259. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1371/journal.pone.0003259")}

Madeira, S. C., & Oliveira, A. L. (2004). "Biclustering Algorithms for Biological Data Analysis: A Survey." IEEE/ACM Transactions on Computational Biology and Bioinformatics, 1(1), 24–45. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1109/TCBB.2004.2")}

Examples


data("simplicaToy")
# Minimal run just to demonstrate function usage, run with default GA parameters
fit <- simplica(df = simplicaToy$data, 
                maxIter = 200,
                popSize = 50,
                mySeeds = 1,
                elitism = 1,
                verbose = TRUE)
plotComponentResult(df = simplicaToy$data,
                    string            = fit$best$string,
                    componentPatterns = fit$best$componentPatternsUpdated,
                    componentScores   = fit$best$componentScores,
                    showAxisLabels    = FALSE,
                    title             = "SIMPLICA on simplicaToy",
                    scoreCutoff       = 25000)



SIMPLICA documentation built on Sept. 11, 2025, 1:08 a.m.

Related to simplica in SIMPLICA...