simplica: SIMPLICA: Simultaneous Identification of Simplivariate...
In SIMPLICA: Biclustering via Simplivariate Component Analysis

simplica

R Documentation

SIMPLICA: Simultaneous Identification of Simplivariate Components

Description

Implements the SIMPLICA algorithm to identify Simplivariate Components in data matrices using a genetic algorithm. These components are related to clusters or biclusters, but defined here in terms of specific structural patterns (constant, additive, multiplicative, or user-defined).

Usage

simplica(
  df,
  maxIter = 2000,
  popSize = 300,
  pCrossover = 0.6,
  pMutation = 0.03,
  zeroFraction = 0.9,
  elitism = 100,
  numSimComp = 5,
  verbose = FALSE,
  mySeeds = 1:5,
  interval = 100,
  penalty = c(constant = 0, additive = 1, multiplicative = 0),
  patternFunctions = defaultPatternFunctions(),
  doSimplicaCV = TRUE,
  cvControl = NULL
)

Arguments

`df`	A numeric data matrix to analyze
`maxIter`	Maximum number of generations for the genetic algorithm (default: 2000)
`popSize`	Population size for the genetic algorithm (default: 300)
`pCrossover`	Crossover probability for genetic algorithm (default: 0.6)
`pMutation`	Mutation probability for genetic algorithm (default: 0.03)
`zeroFraction`	Fraction of population initialized with zeros (default: 0.9)
`elitism`	Number of best individuals preserved between generations (default: 100)
`numSimComp`	Number of Simplivariate Components simultaneously optimized (default: 5)
`verbose`	Logical, whether to print SIMPLICA progress information (default: FALSE)
`mySeeds`	Vector of random seeds for replicate runs (default: 1:5)
`interval`	Interval for monitoring GA progress (default: 100)
`penalty`	Named vector of penalty values for each pattern type (default: c(constant = 0, additive = 1, multiplicative = 0))
`patternFunctions`	List of pattern functions used for fitness evaluation (default: defaultPatternFunctions())
`doSimplicaCV`	Logical, run cross-validated relabeling with simplicaCV() after GA (default: TRUE)
`cvControl`	Optional list to tune simplicaCV; fields passed to simplicaCV via do.call. Defaults if omitted: patternFitters = defaultPatternFitters() preferenceOrder = names(patternFunctions) nRepeats = 40 testFraction = 0.2 minCellsForModels = 25 parsimonyMargin = 0.05 requireFitters = TRUE updateObject = TRUE verbose = verbose

Value

A list with:

best: simplica object (includes original GA result; if doSimplicaCV=TRUE, also componentPatternsUpdated and componentAudit)
raw: list of "ga" objects (one per seed, from the GA package)

References

Hageman, J. A., Wehrens, R., & Buydens, L. M. C. (2008). "Simplivariate Models: Ideas and First Examples." PLoS ONE, 3(9), e3259. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1371/journal.pone.0003259")}

Madeira, S. C., & Oliveira, A. L. (2004). "Biclustering Algorithms for Biological Data Analysis: A Survey." IEEE/ACM Transactions on Computational Biology and Bioinformatics, 1(1), 24–45. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1109/TCBB.2004.2")}

Examples


data("simplicaToy")
# Minimal run just to demonstrate function usage, run with default GA parameters
fit <- simplica(df = simplicaToy$data, 
                maxIter = 200,
                popSize = 50,
                mySeeds = 1,
                elitism = 1,
                verbose = TRUE)
plotComponentResult(df = simplicaToy$data,
                    string            = fit$best$string,
                    componentPatterns = fit$best$componentPatternsUpdated,
                    componentScores   = fit$best$componentScores,
                    showAxisLabels    = FALSE,
                    title             = "SIMPLICA on simplicaToy",
                    scoreCutoff       = 25000)

SIMPLICA documentation built on Sept. 11, 2025, 1:08 a.m.