calibrate: Calibrate raw data to crisp or fuzzy sets In QCA: Qualitative Comparative Analysis

Description

This function transforms (calibrates) the raw data to either crisp or fuzzy sets values, using both the direct and the indirect methods of calibration.

Usage

 ```1 2``` ```calibrate(x, type = "fuzzy", method = "direct", thresholds = NA, logistic = TRUE, idm = 0.95, ecdf = FALSE, below = 1, above = 1, ...) ```

Arguments

 `x` A numerical causal condition. `type` Calibration type, either `"crisp"` or `"fuzzy"`. `method` Calibration method, either `"direct"`, `"indirect"` or `"TFR"`. `thresholds` A vector of (named) thresholds. `logistic` Calibrate to fuzzy sets using the logistic function. `idm` The set inclusion degree of membership for the logistic function. `ecdf` Calibrate to fuzzy sets using the empirical cumulative distribution function of the raw data. `below` Numeric (non-negative), determines the shape below crossover. `above` Numeric (non-negative), determines the shape above crossover. `...` Additional parameters, mainly for backwards compatibility.

Details

Calibration is a transformational process from raw numerical data (interval or ratio level of measurement) to set membership scores, based on a certain number of qualitative anchors.

When `type = "crisp"`, the process is similar to recoding the original values to a number of categories defined by the number of thresholds. For one threshold, the calibration produces two categories (intervals): 0 if below, 1 if above. For two thresholds, the calibration produces three categories: 0 if below the first threshold, 1 if in the interval between the thresholds and 2 if above the second threshold etc.

When `type = "fuzzy"`, calibration produces fuzzy set membership scores, using three anchors for the increasing or decreasing s-shaped distributions (including the logistic function), and six anchors for the increasing or decreasing bell-shaped distributions.

The argument `thresholds` can be specified either as a simple numeric vector, or as a named numeric vector. If used as a named vector, for the first category of s-shaped distributions, the names of the thresholds should be:

 `"e"` for the full set exclusion `"c"` for the set crossover `"i"` for the full set inclusion

For the second category of bell-shaped distributions, the names of the thresholds should be:

 `"e1"` for the first (left) threshold for full set exclusion `"c1"` for the first (left) threshold for set crossover `"i1"` for the first (left) threshold for full set inclusion `"i2"` for the second (right) threshold for full set inclusion `"c2"` for the second (right) threshold for set crossover `"e2"` for the second (right) threshold for full set exclusion

If used as a simple numerical vector, the order of the values matter.

If `e` < `c` < `i`, then the membership function is increasing from `e` to `i`. If `i` < `c` < `e`, then the membership function is decreasing from `i` to `e`.

Same for the bell-shaped distribution, if `e1` < `c1` < `i1` `i2` < `c2` < `e2`, then the membership function is first increasing from `e1` to `i1`, then flat between `i1` and `i2`, and then decreasing from `i2` to `e2`. In contrast, if `i1` < `c1` < `e1` `e2` < `c2` < `i1`, then the membership function is first decreasing from `i1` to `e1`, then flat between `e1` and `e2`, and finally increasing from `e2` to `i2`.

When `logistic = TRUE` (the default), the argument `idm` specifies the inclusion degree of membership for the logistic function. If `logistic = FALSE`, the function returns linear s-shaped or bell-shaped distributions (curved using the arguments `below` and `above`), unless activating the argument `ecdf`.

If there is no prior knowledge on the shape of the distribution, the argument `ecdf` asks the computer to determine the underlying distribution of the empirical, observed points, and the calibrated measures are found along that distribution.

Both `logistic` and `ecdf` arguments can be used only for s-shaped distributions (using 3 thresholds), and they are mutually exclusive.

The parameters `below` and `above` (active only when both `logistic` and `ecdf` are deactivated, establish the degree of concentration and dilation (convex or concave shape) between the threshold and crossover:

 `0 < below < 1` dilates in a concave shape below the crossover `below = 1` produces a linear shape (neither convex, nor concave) `below > 1` concentrates in a convex shape below the crossover `0 < above < 1` dilates in a concave shape above the crossover `above = 1` produces a linear shape (neither convex, nor concave) `above > 1` concentrates in a convex shape above the crossover

Usually, `below` and `above` have equal values, unless specific reasons exist to make them different.

For the `type = "fuzzy"` it is also possible to use the `"indirect"` method to calibrate the data, using a procedure first introduced by Ragin (2008). The indirect method assumes a vector of thresholds to cut the original data into equal intervals, then it applies a (quasi)binomial logistic regression with a fractional polynomial equation.

The results are also fuzzy between 0 and 1, but the method is entirely different: it has no anchors (specific to the direct method), and it doesn't need to specify a calibration function to calculate the scores with.

The third method applied to fuzzy calibrations is called `"TFR"` and calibrates categorical data (such as Likert type response scales) to fuzzy values using the Totally Fuzzy and Relative method (Chelli and Lemmi, 1995).

Value

A numeric vector of set membership scores, either crisp (starting from 0 with increments of 1), or fuzzy numeric values between 0 and 1.

References

Cheli, B.; Lemmi, A. (1995) “A 'Totally' Fuzzy and Relative Approach to the Multidimensional Analysis of Poverty”. In Economic Notes, vol.1, pp.115-134.

Dusa, A. (2018) QCA with R. A Comprehensive Resource. New York: Springer International Publishing.

Ragin, C. (2008) “Fuzzy Sets: Calibration Versus Measurement.” In The Oxford Handbook of Political Methodology, edited by Janet Box-Steffensmeier, Henry E. Brady, and David Collier, pp.87-121. Oxford: Oxford University Press.

Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116``` ```# generate heights for 100 people # with an average of 175cm and a standard deviation of 10cm set.seed(12345) x <- rnorm(n = 100, mean = 175, sd = 10) cx <- calibrate(x, type = "crisp", thresholds = 175) plot(x, cx, main="Binary crisp set using 1 threshold", xlab = "Raw data", ylab = "Calibrated data", yaxt="n") axis(2, at = 0:1) cx <- calibrate(x, type = "crisp", thresholds = c(170, 180)) plot(x, cx, main="3 value crisp set using 2 thresholds", xlab = "Raw data", ylab = "Calibrated data", yaxt="n") axis(2, at = 0:2) # calibrate to a increasing, s-shaped fuzzy-set cx <- calibrate(x, thresholds = "e=165, c=175, i=185") plot(x, cx, main = "Membership scores in the set of tall people", xlab = "Raw data", ylab = "Calibrated data") # calibrate to an decreasing, s-shaped fuzzy-set cx <- calibrate(x, thresholds = "i=165, c=175, e=185") plot(x, cx, main = "Membership scores in the set of short people", xlab = "Raw data", ylab = "Calibrated data") # when not using the logistic function, linear increase cx <- calibrate(x, thresholds = "e=165, c=175, i=185", logistic = FALSE) plot(x, cx, main = "Membership scores in the set of tall people", xlab = "Raw data", ylab = "Calibrated data") # tweaking the parameters "below" and "above" the crossover, # at value 3.5 approximates a logistic distribution, when e=155 and i=195 cx <- calibrate(x, thresholds = "e=155, c=175, i=195", logistic = FALSE, below = 3.5, above = 3.5) plot(x, cx, main = "Membership scores in the set of tall people", xlab = "Raw data", ylab = "Calibrated data") # calibrate to a bell-shaped fuzzy set cx <- calibrate(x, thresholds = "e1=155, c1=165, i1=175, i2=175, c2=185, e2=195", below = 3, above = 3) plot(x, cx, main = "Membership scores in the set of average height", xlab = "Raw data", ylab = "Calibrated data") # calibrate to an inverse bell-shaped fuzzy set cx <- calibrate(x, thresholds = "i1=155, c1=165, e1=175, e2=175, c2=185, i2=195", below = 3, above = 3) plot(x, cx, main = "Membership scores in the set of non-average height", xlab = "Raw data", ylab = "Calibrated data") # the default values of "below" and "above" will produce a triangular shape cx <- calibrate(x, thresholds = "e1=155, c1=165, i1=175, i2=175, c2=185, e2=195") plot(x, cx, main = "Membership scores in the set of average height", xlab = "Raw data", ylab = "Calibrated data") # different thresholds to produce a linear trapezoidal shape cx <- calibrate(x, thresholds = "e1=155, c1=165, i1=172, i2=179, c2=187, e2=195") plot(x, cx, main = "Membership scores in the set of average height", xlab = "Raw data", ylab = "Calibrated data") # larger values of above and below will increase membership in or out of the set cx <- calibrate(x, thresholds = "e1=155, c1=165, i1=175, i2=175, c2=185, e2=195", below = 10, above = 10) plot(x, cx, main = "Membership scores in the set of average height", xlab = "Raw data", ylab = "Calibrated data") # while extremely large values will produce virtually crisp results cx <- calibrate(x, thresholds = "e1=155, c1=165, i1=175, i2=175, c2=185, e2=195", below = 10000, above = 10000) plot(x, cx, main = "Binary crisp scores in the set of average height", xlab = "Raw data", ylab = "Calibrated data", yaxt="n") axis(2, at = 0:1) abline(v = c(165, 185), col = "red", lty = 2) # check if crisp round(cx, 0) # using the empirical cumulative distribution function # require manually setting logistic to FALSE cx <- calibrate(x, thresholds = "e=155, c=175, i=195", logistic = FALSE, ecdf = TRUE) plot(x, cx, main = "Membership scores in the set of tall people", xlab = "Raw data", ylab = "Calibrated data") ## the indirect method, per capita income data from Ragin (2008) inc <- c(40110, 34400, 25200, 24920, 20060, 17090, 15320, 13680, 11720, 11290, 10940, 9800, 7470, 4670, 4100, 4070, 3740, 3690, 3590, 2980, 1000, 650, 450, 110) cinc <- calibrate(inc, method = "indirect", thresholds = "1000, 4000, 5000, 10000, 20000") plot(inc, cinc, main = "Membership scores in the set of high income", xlab = "Raw data", ylab = "Calibrated data") # calibrating categorical data set.seed(12345) values <- sample(1:7, 100, replace = TRUE) TFR <- calibrate(values, method = "TFR") table(round(TFR, 3)) ```

QCA documentation built on July 12, 2018, 9:02 a.m.