# subsetsAndSupersets: Functions to find subsets or supersets In QCA: Qualitative Comparative Analysis

## Description

Functions to find a list of implicants that satisfy some restrictions (see details), or to find the corresponding row numbers in the implicant matrix, for all subsets, or supersets, of a (prime) implicant or an initial causal configuration.

## Usage

 ```1 2 3 4 5 6 7``` ```superSubset(data, outcome = "", conditions = "", relation = "necessity", incl.cut = 1, cov.cut = 0, ron.cut = 0, pri.cut = 0, use.letters = FALSE, depth, add, ...) findSubsets(input, noflevels = NULL, stop = NULL, ...) findSupersets(input, noflevels = NULL, ...) ```

## Arguments

 `data` A data frame with crisp (binary and multi-value) or fuzzy causal conditions `outcome` The name of the outcome. `conditions` A string containing the conditions' names, separated by commas. `relation` The set relation to `outcome`, either `"necessity"`, `"sufficiency"`, `"necsuf"` or `"sufnec"`. Partial words like `"suf"` are accepted. `incl.cut` The minimal inclusion score of the set relation. `cov.cut` The minimal coverage score of the set relation. `ron.cut` The minimal score for the `RoN` - relevance of necessity. `pri.cut` The minimal score for the `PRI` - proportional reduction in inconsistency. `use.letters` Logical, use simple letters instead of original conditions' names. `noflevels` A vector containing the number of levels for each causal condition plus 1 (all subsets are located in the higher dimension, implicant matrix) `input` A vector of row numbers where the (prime) implicants are located, or a matrix of configurations (only for supersets). `stop` The maximum line number (subset) to stop at, and return `depth` Integer, an upper number of causal conditions to form expressions with. `add` A function, or a list containing functions, to add more parameters of fit. `...` Other arguments, mainly for backward compatibility.

## Details

The function `superSubset()` finds a list of implicants that satisfy some restrictions referring to the inclusion and coverage with respect to the outcome, under given assumptions of necessity and/or sufficiency.

Ragin (2000) posits that under the necessity relation, instances of the outcome constitute a subset of the instances of the cause(s). Conversely, under the sufficiency relation, instances of the outcome constitute a superset of the instances of the cause(s).

When `relation = "necessity"` the function finds all implicants which are supersets of the outcome, then eliminates the redundant ones and returns the surviving (minimal) supersets, provided they pass the inclusion and coverage thresholds. If none of the surviving supersets pass these thresholds, the function will find disjunctions of causal conditions, instead of conjunctions.

When `relation = "sufficiency"` it finds all implicants which are subsets of the outcome, and similarly eliminates the redundant ones and return the surviving (minimal) subsets.

When `relation = "necsuf"`, the relation is interpreted as necessity, and `cov.cut` is automatically set equal to the inclusion cutoff `incl.cut`. The same automatic equality is made for `relation = "sufnec"`, when relation is interpreted as sufficiency.

The argument `outcome` specifies the name of the outcome, and if multi-value the argument can also specify the level to explain, using square brackets notation.

Outcomes can be negated using a tilde operator `~X`. The logical argument `neg.out` is now deprecated, but still backwards compatible. Replaced by the tilde in front of the outcome name, it controls whether `outcome` is to be explained or its negation. If `outcome` is from a multivalent variable, it has the effect that the disjunction of all remaining values becomes the new outcome to be explained. `neg.out = TRUE` and a tilde `~` in the outcome name don't cancel each other out, either one (or even both) signaling if the `outcome` should be negated.

If the argument `conditions` is not specified, all other columns in `data` are used.

Along with the standard measures of inclusion and coverage, the function also returns `PRI` for sufficiency and `RoN` (relevance of necessity, see Schneider & Wagemann, 2012) for the necessity relation.

A subset is a conjunction (an intersection) of causal conditions, with respect to a larger (super)set, which is another (but more parsimonious) conjunction of causal conditions.

All subsets of a given set can be found in the so called “implicant matrix”, which is a n^k space, understood as all possible combinations of values in any combination of bases n, each causal condition having three or more levels (Dusa, 2007, 2010).

For every two levels of a binary causal conditions (values 0 and 1), there are three levels in the implicants matrix:

 0 to mark a minimized literal 1 to replace the value of 0 in the original binary condition -1 to replace the value of 1 in the original binary condition

A prime implicant is a superset of an initial combination of causal conditions, and the reverse is also true: the initial combination is a subset of a prime implicant.

Any normal implicant (not prime) is a subset of a prime implicant, and in the same time a superset of some initial causal combinations.

Functions `findSubsets()` and `findSupersets()` find:

 - all possible such subsets for a given (prime) implicant, or - all possible supersets of an implicant or initial causal combination

in the implicant matrix.

The argument `depth` can be used to impose an upper number of causal conditions to form expressions with, it is the complexity level where the search is stopped. Depth is set to a maximum by default, and the algorithm will always stop at the maximum complexity level where no new, non-redundant prime implicants are found. Reducing the depth below that maximum will also reduce computation time.

For examples on how to add more parameters of fit via argument `add`, see the function `pof()`.

## Value

The result of the `superSubset()` function is an object of class "ss", which is a list with the following components:

 `incl.cov` A data frame with the parameters of fit. `coms` A data frame with the (m)embersip (s)cores of the resulting (co)mbinations.

For `findSubsets()` and `findSupersets()`, a vector with the row numbers corresponding to all possible subsets, or supersets, of a (prime) implicant.

## References

Cebotari, V.; Vink, M.P. (2013) “A Configurational Analysis of Ethnic Protest in Europe”. International Journal of Comparative Sociology vol.54, no.4, pp.298-324, doi: 10.1177/0020715213508567.

Cebotari, Victor; Vink, Maarten Peter (2015) Replication Data for: A configurational analysis of ethnic protest in Europe, Harvard Dataverse, V2, doi: 10.7910/DVN/PT2IB9.

Dusa, A. (2007b) Enhancing Quine-McCluskey. WP 2007-49, COMPASSS Working Papers series.

Dusa, Adrian (2010) “A Mathematical Approach to the Boolean Minimization Problem.” Quality & Quantity vol.44, no.1, pp.99-113, doi: 10.1007/s11135-008-9183-x.

Lipset, S. M. (1959) “Some Social Requisites of Democracy: Economic Development and Political Legitimacy”, American Political Science Review vol.53, pp.69-105.

Schneider, Carsten Q.; Wagemann, Claudius (2012) Set-Theoretic Methods for the Social Sciences: A Guide to Qualitative Comparative Analysis (QCA). Cambridge: Cambridge University Press.

`createMatrix`, `getRow`

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91``` ``` # Lipset binary crisp sets ssLC <- superSubset(LC, "SURV") library(venn) x = list("SURV" = which(LC\$SURV == 1), "STB" = which(ssLC\$coms[, 1] == 1), "LIT" = which(ssLC\$coms[, 2] == 1)) venn(x, cexil = 0.7) # Lipset multi-value sets superSubset(LM, "SURV") # Cebotari & Vink (2013) fuzzy data # all necessary combinations with at least 0.9 inclusion and 0.6 coverage cut-offs ssCVF <- superSubset(CVF, outcome = "PROTEST", incl.cut = 0.90, cov.cut = 0.6) ssCVF # the membership scores for the first minimal combination (GEOCON) ssCVF\$coms\$GEOCON # same restrictions, for the negation of the outcome superSubset(CVF, outcome = "~PROTEST", incl.cut = 0.90, cov.cut = 0.6) # to find supersets or supersets, a hypothetical example using # three binary causal conditions, having two levels each: 0 and 1 noflevels <- c(2, 2, 2) # second row of the implicant matrix: 0 0 1 # which in the "normal" base is: - - 0 # the prime implicant being: ~C (sub <- findSubsets(input = 2, noflevels + 1)) # 5 8 11 14 17 20 23 26 getRow(sub, noflevels + 1) # implicant matrix normal values # a b c | a b c # 5 0 1 1 | 5 - 0 0 ~b~c # 8 0 2 1 | 8 - 1 0 b~c # 11 1 0 1 | 11 0 - 0 ~a~c # 14 1 1 1 | 14 0 0 0 ~a~b~c # 17 1 2 1 | 17 0 1 0 ~ab~c # 20 2 0 1 | 20 1 - 0 a~c # 23 2 1 1 | 23 1 0 0 a~b~c # 26 2 2 1 | 26 1 1 0 ab~c # stopping at maximum row number 20 findSubsets(input = 2, noflevels + 1, stop = 20) # 5 8 11 14 17 20 # ----- # for supersets findSupersets(input = 14, noflevels + 1) # 2 4 5 10 11 13 14 findSupersets(input = 17, noflevels + 1) # 2 7 8 10 11 16 17 # input as a matrix (im <- getRow(c(14, 17), noflevels + 1)) # implicant matrix normal values # 14 1 1 1 | 14 0 0 0 ~a~b~c # 17 1 2 1 | 17 0 1 0 ~ab~c sup <- findSupersets(input = im, noflevels + 1) sup # 2 4 5 7 8 10 11 13 14 16 17 getRow(sup, noflevels + 1) # implicant matrix normal values # a b c | a b c # 2 0 0 1 | 2 - - 0 ~c # 4 0 1 0 | 4 - 0 - ~b # 5 0 1 1 | 5 - 0 0 ~b~c # 7 0 2 0 | 7 - 1 - b # 8 0 2 1 | 8 - 1 0 b~c # 10 1 0 0 | 10 0 - - ~a # 11 1 0 1 | 11 0 - 0 ~a~c # 13 1 1 0 | 13 0 0 - ~a~b # 14 1 1 1 | 14 0 0 0 ~a~b~c # 16 1 2 0 | 16 0 1 - ~ab # 17 1 2 1 | 17 0 1 0 ~ab~c ```

### Example output

```Loading required package: admisc

To cite package QCA in publications, please use:
Dusa, Adrian (2019) QCA with R. A Comprehensive Resource.
Springer International Publishing.

To run the graphical user interface, use: runGUI()

inclN   RoN   covN
--------------------------------------------
1  LIT[1]                1.000  0.500  0.615
2  STB[1]                1.000  0.700  0.727
3  LIT[1]*STB[1]         1.000  0.900  0.889
4  DEV[1]+IND[1]         1.000  0.800  0.800
5  URB[0]+IND[1]         1.000  0.000  0.444
6  DEV[2]+URB[1]+IND[0]  1.000  0.100  0.471
--------------------------------------------

inclN   RoN   covN
----------------------------------------------------------
1  GEOCON                             0.904  0.492  0.624
2  DEMOC+ETHFRACT+~GEOCON             0.930  0.470  0.626
3  DEMOC+~ETHFRACT+POLDIS             0.918  0.506  0.637
4  DEMOC+ETHFRACT+POLDIS              0.906  0.502  0.630
5  DEMOC+~ETHFRACT+~NATPRIDE          0.905  0.527  0.641
6  DEMOC+ETHFRACT+~NATPRIDE           0.935  0.530  0.656
7  DEMOC+~GEOCON+POLDIS               0.920  0.539  0.654
8  DEMOC+~GEOCON+~NATPRIDE            0.908  0.584  0.671
9  DEMOC+POLDIS+~NATPRIDE             0.916  0.596  0.682
10  ~ETHFRACT+POLDIS+~NATPRIDE         0.911  0.554  0.657
11  ~DEMOC+ETHFRACT+POLDIS+~NATPRIDE   0.913  0.532  0.647
12  ETHFRACT+~GEOCON+POLDIS+~NATPRIDE  0.911  0.613  0.688
----------------------------------------------------------

[1] 0.95 0.35 0.35 0.78 0.35 0.78 0.78 0.78 0.78 0.05 0.78 0.35 0.95 0.95 0.35
[16] 0.95 0.78 0.35 0.95 0.35 0.95 0.95 0.95 0.95 0.95 0.95 0.95 0.95 0.95

inclN   RoN   covN
-----------------------------------------
1  NATPRIDE           0.932  0.622  0.693
2  ~DEMOC+~ETHFRACT   0.951  0.548  0.663
3  ~ETHFRACT+~POLDIS  0.927  0.443  0.603
-----------------------------------------

[1]  5  8 11 14 17 20 23 26
[,1] [,2] [,3]
[1,]    0    1    1
[2,]    0    2    1
[3,]    1    0    1
[4,]    1    1    1
[5,]    1    2    1
[6,]    2    0    1
[7,]    2    1    1
[8,]    2    2    1
[1]  5  8 11 14 17 20
[1]  2  4  5 10 11 13 14
[1]  2  7  8 10 11 16 17
[,1] [,2] [,3]
[1,]    1    1    1
[2,]    1    2    1
[1]  2  4  5  7  8 10 11 13 14 16 17
[,1] [,2] [,3]
[1,]    0    0    1
[2,]    0    1    0
[3,]    0    1    1
[4,]    0    2    0
[5,]    0    2    1
[6,]    1    0    0
[7,]    1    0    1
[8,]    1    1    0
[9,]    1    1    1
[10,]    1    2    0
[11,]    1    2    1
```

QCA documentation built on June 16, 2021, 1:06 a.m.