# randomConds: Generate random solution formulas In cna: Causal Modeling with Coincidence Analysis

 randomConds R Documentation

## Generate random solution formulas

### Description

Based on a set of factorsâ€”given as a data frame or `configTable`â€”, `randomAsf` generates a random atomic solution formula (asf) and `randomCsf` a random (acyclic) complex solution formula (csf).

### Usage

``````randomAsf(x, outcome = NULL, positive = TRUE,
maxVarNum = if (type == "mv") 8 else 16, compl = NULL,
how = c("inus", "minimal"))
randomCsf(x, outcome = NULL, positive = TRUE,
n.asf = NULL, compl = NULL, maxVarNum = if (type == "mv") 8 else 16)
``````

### Arguments

 `x` Data frame or `configTable`; determines the number of factors, their names and their possible values. In `randomAsf`, `x` must have >=3 columns, in `randomCsf`, `x` must have >=4 columns. As a shorthand, `x` can also be an integer, in which case `full.ct(x)` is executed first. `outcome` Optional character vector (of length 1 in `randomAsf`) specifying the outcome factor value(s) in the solution formula. Must be factor values, e.g. `"A"` or `"b"` in case of binary data or `"A=1"` in case of multi-value data. With multi-value data, factor names are also allowed; a value of that factor will then be chosen at random. `outcome` overrides `positive` and `n.asf`. `positive` Logical; if TRUE (default), the outcomes all have positive values. If `FALSE`, a value (positive or negative in case of binary data) will be picked at random. `positive` has no effect if the `outcome` argument is not `NULL` or if `x` contains multi-value data. `maxVarNum` Maximal number of factors in `x` that can appear in the generated asf or csf. The default depends on the type of the data contained in `x`. `compl` Integer vector specifying the maximal complexity of the formula (i.e. number of factors in msc; number of msc in asf). Alternatively, `compl` can be a list of two integer vectors; then the first vector is used for the initial complexity of the msc and the second for that of the asf. `how` Character string, either `"inus"` or `"minimal"`, specifying whether the generated solution formula is redundancy-free relative to `full.ct(x)` or relative to `x` (see details below). `n.asf` Integer scalar specifying the number of asf in the csf. Is overridden by `length(outcome)` if `outcome` is not `NULL`. Note that `n.asf` is limited to `ncol(x)-2`.

### Details

`randomAsf` and `randomCsf` can be used to randomly draw data generating structures (ground truths) in inverse search trials benchmarking the output of `cna`. In the regularity theoretic context in which the CNA method is embedded, a causal structure is a redundancy-free Boolean dependency structure. Hence, `randomAsf` and `randomCsf` both produce redundancy-free Boolean dependency structures. `randomAsf` generates structures with one outcome, i.e. atomic solution formulas (asf), `randomCsf` generates structures with multiple outcomes, i.e. complex solution formulas (csf), that are free of cyclic substructures. In a nutshell, `randomAsf` proceeds by, first, randomly drawing disjunctive normal forms (DNFs) and by, second, eliminating redundancies from these DNFs. `randomCsf` essentially consists in repeated executions of `randomAsf`.

The only mandatory argument of `randomAsf` and `randomCsf` is a data frame or a `configTable` `x` defining the factors (with their possible values) from which the generated asf and csf shall be drawn.

The optional argument `outcome` determines which values of which factors in `x` shall be treated as outcomes. If `outcome = NULL` (default), `randomAsf` and `randomCsf` randomly draw factor values from `x` to be treated as outcome(s). If `positive = TRUE` (default), only positive outcome values are chosen in case of crisp-set data; if `positive = FALSE`, outcomes values are drawn from the set {1,0} at random. `positive` only has an effect if `x` contains crisp-set data and `outcome = NULL`.

The maximal number of factors included in the generated asf and csf can be controlled via the argument `maxVarNum`. This is relevant when `x` is of high dimension, as generating solution formulas with more than 20 factors is computationally demanding and, accordingly, may take a long time (or even exhaust computer memory).

The argument `compl` controls the complexity of the generated asf and csf. More specifically, the initial complexity of asf and csf (i.e. the number of factors included in msc and the number of msc included in asf prior to redundancy elimination) is drawn from the vector or list of vectors `compl`. As this complexity might be reduced in the subsequent process of redundancy elimination, issued asf or csf will often have lower complexity than specified in `compl`. The default value of `compl` is determined by the number of columns in `x`. Assigning unduly high values to `compl` results in an error.

`randomAsf` has the additional argument `how` with the two possible values `"inus"` and `"minimal"`. `how = "inus"` determines that the generated asf is redundancy-free relative to all logically possible configurations of the factors in `x`, i.e. relative to `full.ct(x)`, whereas in case of `how = "minimal"` redundancy-freeness is imposed only relative to all configurations actually contained in `x`, i.e. relative to `x` itself. Typically `"inus"` should be used; the value `"minimal"` is relevant mainly in repeated `randomAsf` calls from within `randomCsf`. Moreover, setting `how = "minimal"` will return an error if `x` is a `configTable` of type `"fs"`.

The argument `n.asf` controls the number of asf in the generated csf. Its value is limited to `ncol(x)-2` and overridden by `length(outcome)`, if `outcome` is not `NULL`. Analogously to `compl`, `n.asf` specifies the number of asf prior to redundancy elimination, which, in turn, may further reduce these numbers. That is, `n.asf` provides an upper bound for the number of asf in the resulting csf.

### Value

The randomly generated formula, a character string.

`is.submodel`, `selectCases`, `full.ct`, `configTable`, `cna`.

### Examples

``````# randomAsf
# ---------
# Asf generated from explicitly specified binary factors.
randomAsf(full.ct("H*I*T*R*K"))
randomAsf(full.ct("Johnny*Debby*Aurora*Mars*James*Sonja"))

# Asf generated from a specified number of binary factors.
randomAsf(full.ct(7))
# In shorthand form.
randomAsf(7)

# Randomly choose positive or negative outcome values.
replicate(10, randomAsf(7, positive = FALSE))

# Asf generated from an existing data frame.
randomAsf(d.educate)

# Specify the outcome.
randomAsf(d.educate, outcome = "G")

# Specify the complexity.
# Initial complexity of 2 conjunctions and 2 disjunctions.
randomAsf(full.ct(7), compl = 2)
# Initial complexity of 3:4 conjunctions and 3:4 disjunctions.
randomAsf(full.ct(7), compl = 3:4)
# Initial complexity of 2 conjunctions and 3:4 disjunctions.
randomAsf(full.ct(7), compl = list(2,3:4))

# Redundancy-freeness relative to x instead of full.ct(x).
randomAsf(d.educate, outcome = "G", how = "minimal")

# Asf with multi-value factors.
randomAsf(allCombs(c(3,4,3,5,3,4)))
# Set the outcome value.
randomAsf(allCombs(c(3,4,3,5,3,4)), outcome = "B=4")
# Choose a random value of factor B.
randomAsf(allCombs(c(3,4,3,5,3,4)), outcome = "B")

# Asf from fuzzy-set data.
randomAsf(d.jobsecurity)
randomAsf(d.jobsecurity, outcome = "JSR")

# Generate 20 asf for outcome "e".
replicate(20, randomAsf(7, compl = 2:3, outcome = "e"))

# randomCsf
# ---------
# Csf generated from explicitly specified binary factors.
randomCsf(full.ct("H*I*T*R*K*Q*P"))

# Csf generated from a specified number of binary factors.
randomCsf(full.ct(7))
# In shorthand form.
randomCsf(7)

# Randomly choose positive or negative outcome values.
replicate(5, randomCsf(7, positive = FALSE))

# Specify the outcomes.
randomCsf(d.volatile, outcome = c("RB","se"))

# Specify the complexity.
randomCsf(d.volatile, outcome = c("RB","se"), compl = 2)
randomCsf(full.ct(7), compl = 3:4)
randomCsf(full.ct(7), compl = list(2,4))

# Specify the maximal number of factors.
randomCsf(d.highdim, maxVarNum = 10)
randomCsf(d.highdim, maxVarNum = 15) # takes a while to complete

# Specify the number of asf.
randomCsf(full.ct(7), n.asf = 3)

# Csf with multi-value factors.
randomCsf(allCombs(c(3,4,3,5,3,4)))
# Set the outcome values.
randomCsf(allCombs(c(3,4,3,5,3,4)), outcome = c("A=1","B=4"))

# Generate 20 csf.
replicate(20, randomCsf(full.ct(7), n.asf = 2, compl = 2:3))

# Inverse searches
# ----------------
# === Ideal Data ===
# Draw the data generating structure. (Every run yields different
# targets and data.)
target <- randomCsf(full.ct(5), n.asf = 2)
target
# Select the cases compatible with the target.
x <- selectCases(target)
# Run CNA without an ordering.
mycna <- cna(x)
# Extract the csf.
csfs <- csf(mycna)
# Check whether the target is completely returned.
any(unlist(lapply(csfs\$condition, identical.model, target)))

# === Data fragmentation (20% missing observations) ===
# Draw the data generating structure. (Every run yields different
# targets and data.)
target <- randomCsf(full.ct(7), n.asf = 2)
target
# Generate the ideal data.
x <- ct2df(selectCases(target))
# Introduce fragmentation.
x <- x[-sample(1:nrow(x), nrow(x)*0.2), ]
# Run CNA without an ordering.
mycna <- cna(x)
# Extract the csf.
csfs <- csf(mycna)
# Check whether (a submodel of) the target is returned.
any(is.submodel(csfs\$condition, target))

# === Data fragmentation and noise (20% missing observations, noise ratio of 0.05) ===
# Multi-value data.
# Draw the data generating structure. (Every run yields different
# targets and data.)
fullData <- allCombs(c(4,4,4,4,4))
target <- randomCsf(fullData, n.asf=2, compl = 2:3)
target
# Generate the ideal data.
idealData <- ct2df(selectCases(target, fullData))
# Introduce fragmentation.
x <- idealData[-sample(1:nrow(idealData), nrow(idealData)*0.2), ]
# Add random noise.
incompData <- dplyr::setdiff(ct2df(fullData), idealData)
x <- rbind(ct2df(incompData[sample(1:nrow(incompData), nrow(x)*0.05), ]), x)
# Run CNA without an ordering.
mycna <- cna(x, con = .7, cov = .7, maxstep = c(3, 3, 12))
mycna
# Extract the csf.
csfs <- csf(mycna)
# Check whether no error (no false positive) is returned.
if(nrow(csfs)==0) {
TRUE } else {any(is.submodel(csfs\$condition, target))}
``````

cna documentation built on Aug. 11, 2023, 1:09 a.m.