frscored_cna: frscored_cna
In frscore: Functions for Calculating Fit-Robustness of CNA-Solutions

View source: R/frscored_cna.R

frscored_cna

R Documentation

frscored_cna

Description

Perform a reanalysis series on a data set and calculate the fit-robustness scores of the resulting solutions/models

Usage

frscored_cna(
  x,
  fit.range = c(1, 0.7),
  granularity = 0.1,
  output = c("csf", "asf", "msc"),
  scoretype = c("full", "supermodel", "submodel"),
  normalize = c("truemax", "idealmax", "none"),
  verbose = FALSE,
  maxsols = 50,
  test.model = NULL,
  print.all = FALSE,
  comp.method = c("causal_submodel", "is.submodel"),
  n.init = 1000,
  ...
)

Arguments

`x`	A `data.frame` or `configTable` to be analyzed with `cna()`. In case of multi-value or fuzzy-set data, the data type must be indicated by `type = "mv"` and `type = "fs"`, respectively.
`fit.range`	Numeric vector of length 2; determines the maximum and minimum values of the interval of consistency and coverage thresholds used in the reanalysis series. Defaults to `c(1, 0.7)`.
`granularity`	Numeric scalar; consistency and coverage are varied by this value in the reanalysis series. Defaults to `0.1`.
`output`	String that determines whether csfs, asfs, or mscs are returned; `"csf"` (default) returns csfs, `"asf"` asfs, and `"msc"` mscs.
`scoretype`	String specifying the scoring method: `"full"` (default; scoring is based on counting sub- and supermodel relations), `"supermodel"` (count supermodels only), `"submodel"` (count submodels only). Allowed for backward compatibility only, due to be dropped in next version.
`normalize`	String that determines the method used in normalizing the scores. `"truemax"` (default) normalizes by the highest score among the elements of `sols`, such that the highest scoring solution types get score 1. `"idealmax"` normalizes by a theoretical maximum score (see Details).
`verbose`	Logical; if `TRUE`, additional information about causal compatibility relations among the unique solution types found in `sols` is printed. Defaults to `FALSE`.
`maxsols`	Integer determining the maximum number of unique solution types found in the reanalysis series to be included in the scoring (see Details).
`test.model`	String that specifies a single candidate `cna()` solution/model whose fit-robustness score is calculated against the results of the reanalysis series.
`print.all`	Logical that controls the number of entries printed when printing the results. If `TRUE`, results are printed as when using the defaults of `print.data.frame`. If `FALSE`, 20 highest scoring solutions/models are printed.
`comp.method`	String that determines how the models in `sols` are compared to determine their fr-score. `"causal_submodel"` (the default) checks for causal submodel relations using `causal_submodel()`, `"is.submodel"` checks for syntactic submodel relations with `is.submodel()`
`n.init`	Integer that determines the maximum number of csfs built in the analyses, see `cna::csf()`. Only applied when `output = "csf"`.
`...`	Any arguments to be passed to `cna()` except `con`, `cov` or `con.msc`. The effect of argument `what` is overriden by `output`.

Details

frscored_cna() is a wrapper function that sequentially executes rean_cna() and frscore(), meaning it performs both computational phases of fit-robustness scoring as introduced in Parkkinen and Baumgartner (2021). In the first phase, the function conducts a reanalysis series on the input data x at all combinatorially possible combinations of fit thresholds that can be generated from the interval given by fit.range at increments given by granularity and collects all solutions/models in a set M. In the second phase, it calculates the fit-robustness scores of the atomic (asf) and/or complex (csf) solution formulas in M. The argument output allows for controlling whether csf or only asf are built, the latter normally being faster but less complete.

The argument scoretype is deprecated as of frscore 0.3.1, and will be dropped from future versions of the package. Giving it a non-default value is allowed so that older code can be run without errors, but doing this is otherwise discouraged. The permissible values of scoretype have the following effects. When set to its default value "full", the score of each solution/model m in M is calculated by counting the number of the (either causal or syntactic) sub- and supermodel relations m has to the other elements of M. Whether causal or syntactic submodel relations are counted depends on the value of comp.method: "causal_submodel" (default) counts causal submodel relations using causal_submodel(), "is.submodel" counts syntactic submodel relations using cna::is.submodel(). Setting scoretype to "supermodel" or "submodel" forces the scoring to be based on, respectively, supermodel and submodel relations only. In future versions of frscore, fit-robustness scores will always be calculated as with scoretype = "full", and changing this will not be possible. If additional information about the numbers of sub- vs. supermodel relations a particular model has to other models is needed, this can be acquired by inspecting the "verbout" element of the output of frscored_cna().

The fit-robustness scores can be normalized in two ways. In the default setting normalize = "truemax", the score of each sols[i] is divided by the maximum score obtained by an element of sols. In case of normalize = "idealmax", the score is normalized not by an actually obtained maximum but by an idealized maximum, which is calculated by assuming that all solutions of equal complexity in sols are identical and that for every sols[i] of a given complexity, all less complex elements of sols are its submodels and all more complex elements of sols are its supermodels. When normalization is applied, the normalized score is shown in its own column norm.score in the results. The raw scores are shown in the column score.

If the argument verbose is set to TRUE, frscored_cna() also prints a list indicating for each solution/model how many raw score points it receives from which elements of M. The verbose list is ordered with decreasing fit robustness scores.

If the size of the consistency and coverage range scanned in the reanalysis series generating M is large or there are many model ambiguities, M may contain so many different types of solutions that robustness cannot be calculated for all of them in reasonable time. In that case, the argument maxsols allows for capping the number of solution types to be included in the scoring (defaults to 50). frscored_cna() then selects the most frequent solutions in M of each complexity level until maxsols is reached and only scores the thus selected elements of M.

If the user is interested in the robustness of one specific candidate model, that model can be given to frscored_cna() by the argument test.model. The result for that model will then be printed separately, provided the model is found in the reanalysis series, if not, the function stops.

Value

A list whose first element is a data frame that contains the model types returned from a reanalysis series of the input data, their details such as consistency and coverage, together with the unadjusted fit-robustness score of each model type shown in column 'score', and a normalized score in column 'norm.score' in case normalize = "truemax" or normalize = "idealmax". The other elements contain additional information about the submodel relations among the unique solution types and about how the function was called.

References

P. Emmenegger (2011) “Job Security Regulations in Western Democracies: A Fuzzy Set Analysis.” European Journal of Political Research 50(3):336-64.

C. Hartmann and J. Kemmerzell (2010) “Understanding Variations in Party Bans in Africa.” Democratization 17(4):642-65. doi:10.1080/13510347.2010.491189.

V.P. Parkkinen and M. Baumgartner (2021), “Robustness and Model Selection in Configurational Causal Modeling,” Sociological Methods and Research, doi:10.1177/0049124120986200.

Examples

# Robustness analysis from sect. 4 of Parkkinen and Baumgartner (2021)
frscored_cna(d.error, fit.range = c(1, 0.75), granularity = 0.05,
             ordering = list("E"), strict = TRUE)

# Multi-value data from Hartmann and Kemmerzell (2010)
frscored_cna(d.pban, type = "mv", fit.range = c(0.9, 0.7), granularity = 0.1,
                  normalize = "none", ordering = list("T", "PB"), strict = TRUE)

# Fuzzy-set data from Emmenegger (2011)
frscored_cna(d.jobsecurity, type = "fs", fit.range = c(0.9, 0.6), granularity = 0.05,
                  scoretype = "submodel", ordering = list("JSR"), strict = TRUE)

# Artificial data
dat <- data.frame(
  A = c(1,1,0,0,0,0,1,1),
  B = c(0,1,0,0,0,0,1,1),
  C = c(1,0,1,0,1,0,1,0),
  D = c(1,1,0,0,1,1,0,0),
  E = c(1,1,1,1,0,0,0,0))
frscored_cna(dat)
frscored_cna(dat, output = "asf")
frscored_cna(dat, maxsols = 10)
frscored_cna(dat, test.model = "(b*e+A*E<->D)*(B<->A)")

frscore documentation built on June 22, 2024, 9:43 a.m.