full.ct: Generate the logically possible value configurations of a...
In cna: Causal Modeling with Coincidence Analysis

full.ct

R Documentation

Generate the logically possible value configurations of a given set of factors

Description

The function full.ct generates a configTable with all (or a specified number of) logically possible value configurations of the factors defined in the input x. x can be a configTable, a data frame, an integer, a list specifying the factors' value ranges, a character string expressing a condition featuring all admissible factor values, or a condTbl.

The function allCombs generates a configTable (of type "mv") of all possible value configurations of length(nvals) factors, the first factor having nvals[1] values, the second nvals[2] values etc. The factors are labeled using capital letters.

Usage

full.ct(x, ...)

## Default S3 method:
full.ct(x, type = "auto", cond = NULL, nmax = NULL, ...)
## S3 method for class 'configTable'
full.ct(x, cond = NULL, nmax = NULL, ...)
## S3 method for class 'condTbl'
full.ct(x, nmax = NULL, ...)
## S3 method for class 'cti'
full.ct(x, cond = NULL, nmax = NULL, ...)

allCombs(nvals)

Arguments

`x`	A `configTable`, a data frame, a matrix, an integer, a list specifying the factors' value ranges, a character vector featuring all admissible factor values, or a `condTbl` (see the Details and Examples below).
`type`	Character vector specifying the type of `x`: `"auto"` (automatic detection; default), `"cs"` (crisp-set), `"mv"` (multi-value), or `"fs"` (fuzzy-set). (Manual specification of the `type` only has an effect if `x` is a data frame or matrix.)
`cond`	Optional character vector containing conditions in the syntax of msc, asf or csf. If it is not `NULL`, only factors appearing in `cond` are used.
`nmax`	Maximal number of rows in the `configTable` output by `full.ct`. If `nmax` is smaller than the total number of logically possible configurations, a random sample of configurations is drawn. The default `nmax = NULL` outputs all logically possible configurations.
`...`	Further arguments passed to methods.
`nvals`	An integer vector with values >0.

Details

full.ct generates all or nmax logically possible value configurations of the factors defined in x, which can either be a character vector or a condTbl or an integer or a list or a configTable or a data frame or a matrix.

If x is a character vector, it can contain conditions of any of the three types of conditions, boolean, atomic or complex (see condition/condList). x must contain at least one factor. Factor names and admissible values are guessed from the Boolean formulas. If x contains multi-value factors, only those values are considered admissible that are explicitly contained in x. Accordingly, in case of multi-value factors, full.ct should be given the relevant factor definitions by means of a list (see below).
If x is a condTbl as produced by msc, asf, csf, or condTbl and containing a character column "condition", the output will be the same as when full.ct is applied to x$condition, which is a character vector containing conditions (see the previous bullet point).
If x is an integer, the output is a configuration table of type "cs" with x factors. If x <= 26, the first x capital letters of the alphabet are used as the names of the factors. If x > 26, factors are named "X1" to "Xx".
If x is a list, x is expected to have named elements each of which provides the factor names with corresponding vectors enumerating their admissible values (i.e. their value ranges). These values must be non-negative integers.
If x is a configTable, data frame, or matrix, colnames(x) are interpreted as factor names and the rows as enumerating the admissible values (i.e. as value ranges). If x is a data frame or a matrix, x is first converted to a configTable (the function configTable is called with type as specified in full.ct), and the configTable method of full.ct is then applied to the result. The configTable method uses all factors and factor values occurring in the configTable. If x is of type "fs", 0 and 1 are taken as the admissible values.

The computational demand of generating all logically possible configurations increases exponentially with the number of factors in x. In order to get an output in reasonable time, even when x features more than about 15 factors, the argument nmax allows for specifying a maximal number of configurations to be returned (by random sampling).

If not all factors specified in x are of interest but only those in a given msc, asf, or csf, full.ct can be correspondingly restricted via the argument cond. For instance, full.ct(d.educate, cond = "D + L <-> E") generates the logically possible value configurations of the factors in the set {D, L, E}, even though d.educate contains further factors. The argument cond is primarily used internally to speed up the execution of various functions in case of high-dimensional data.

The main area of application of full.ct is data simulation in the context of inverse search trials benchmarking the output of cna (see Examples below). While full.ct generates the relevant space of logically possible configurations of the factors in an analyzed factor set, selectCases selects those configurations from this space that are compatible with a given data generating causal structure (i.e. the ground truth), that is, it selects the empirically possible configurations. The ground truth can be randomly generated by the functions in randomConds. The function makeFuzzy generates fuzzy data from the output of full.ct or selectCases. And is.submodel can be used to check whether the models output by cna are true of the ground truth.

The method for class "cti" is for internal use only.

The function allCombs serves the same purpose as full.ct but is less general. It expects an integer vector with nvals values, >0, and then generates a configTable (of type "mv") of all possible value configurations of length(nvals) factors, the first factor having nvals[1] values, the second nvals[2] values etc. The factors are labeled using capital letters.

Value

A configTable of type "cs" or "mv" with the full enumeration of combinations of the factor values.

Examples

# x is a character vector.
full.ct("A + B*c")
full.ct("A=1*C=3 + B=2*C=1 + A=3*B=1") 
full.ct(c("A + b*C", "a*D"))
full.ct("!A*-(B + c) + F")
full.ct(c("A=1", "A=2", "B=1", "B=0", "C=13","C=45"))

# x is a condTbl.
ana.pban <- cna(d.pban, ordering = "PB", con = .85, cov = .9, 
                    measures = c("PAcon", "PACcov")) 
full.ct(csf(ana.pban))

# x is a data frame.
full.ct(d.educate)
full.ct(d.jobsecurity)
full.ct(d.pban)

# x is a configTable.
full.ct(configTable(d.jobsecurity))
full.ct(configTable(d.pban), cond = "C=1 + F=0 <-> V=1") 

# x is an integer.
full.ct(6)
# Constrain the number of configurations to 1000.
full.ct(30, nmax = 1000) 

# x is a list.
full.ct(list(A = 0:1, B = 0:1, C = 0:1))  # cs
full.ct(list(A = 1:2, B = 0:1, C = 23:25))  # mv

# Simulating crisp-set data.
groundTruth.1 <- "(A*b + C*d <-> E)*(E*H + I*k <-> F)"
fullData <- ct2df(full.ct(groundTruth.1))
idealData <- ct2df(selectCases(groundTruth.1, fullData))
# Introduce 20% data fragmentation.
fragData <- idealData[-sample(1:nrow(idealData), nrow(idealData)*0.2), ] 
# Add 10% random noise.
incompData <- dplyr::setdiff(fullData, idealData)
(realData <- rbind(incompData[sample(1:nrow(incompData), nrow(fragData)*0.1), ], 
  fragData))

# Simulating multi-value data.
groundTruth.2 <- "(JO=3 + TS=1*PE=3 <-> ES=1)*(ES=1*HI=4 + IQ=2*KT=5 <-> FA=1)"
fullData <- ct2df(full.ct(list(JO=1:3, TS=1:2, PE=1:3, ES=1:2, HI=1:4, IQ=1:5, KT=1:5, FA=1:2)))
idealData <- ct2df(selectCases(groundTruth.2, fullData))
# Introduce 20% data fragmentation.
fragData <- idealData[-sample(1:nrow(idealData), nrow(idealData)*0.2), ] 
# Add 10% random noise.
incompData <- dplyr::setdiff(fullData, idealData)
(realData <- rbind(incompData[sample(1:nrow(incompData), nrow(fragData)*0.1), ], 
  fragData))


# allCombs
# --------
# Generate all logically possible configurations of 5 dichotomous factors named "A", "B",
# "C", "D", and "E". 
allCombs(c(2, 2, 2, 2, 2)) - 1
# allCombs(c(2, 2, 2, 2, 2)) generates the value space for values 1 and 2, but as it is
# conventional to use values 0 and 1 for Boolean factors, 1 must be subtracted from
# every value output by allCombs(c(2, 2, 2, 2, 2)) to yield a Boolean data frame.

# Generate all logically possible configurations of 5 multi-value factors named "A", "B",
# "C", "D", and "E", such that A can take on 3 values {1,2,3}, B 4 values {1,2,3,4},
# C 3 values etc.
dat0 <- allCombs(c(3, 4, 3, 5, 3))
head(dat0)
nrow(dat0) # = 3*4*3*5*3

# Generate all configurations of 5 dichotomous factors that are compatible with the 
# causal chain (A*b + a*B <-> C)*(C*d + c*D <-> E).
dat1 <- allCombs(c(2, 2, 2, 2, 2)) - 1
(dat2 <- selectCases("(A*b + a*B <-> C)*(C*d + c*D <-> E)", dat1))

# Generate all configurations of 5 multi-value factors that are compatible with the 
# causal chain (A=2*B=1 + A=3*B=3 <-> C=1)*(C=1*D=2 + C=4*D=4 <-> E=3).
dat1 <- allCombs(c(3, 3, 4, 4, 3))
dat2 <- selectCases("(A=2*B=1 + A=3*B=3 <-> C=1)*(C=1*D=2 + C=4*D=4 <-> E=3)", dat1)
nrow(dat1)
nrow(dat2)                    

# Generate all configurations of 5 fuzzy-set factors that are compatible with the 
# causal structure A*b + C*D <-> E, such that con = .8 and cov = .8.
dat1 <- allCombs(c(2, 2, 2, 2, 2)) - 1
dat2 <- makeFuzzy(dat1, fuzzvalues = seq(0, 0.45, 0.01))
(dat3 <- selectCases1("A*b + C*D <-> E", con = .8, cov = .8, dat2))

# Inverse search for the data generating causal structure A*b + a*B + C*D <-> E from
# fuzzy-set data with non-perfect scores on standard consistency and coverage.
set.seed(3)
groundTruth <- "A*b + a*B + C*D <-> E"
dat1 <- allCombs(c(2, 2, 2, 2, 2)) - 1
dat2 <- makeFuzzy(dat1, fuzzvalues = 0:4/10)
dat3 <- selectCases1(groundTruth, con = .8, cov = .8, dat2)
ana1 <- cna(dat3, outcome = "E", con = .8, cov = .8)
any(is.submodel(asf(ana1)$condition, groundTruth))

cna documentation built on April 11, 2025, 6:10 p.m.