Frechet.bounds.cat | R Documentation |

This function permits to derive the bounds for cell probabilities of the table Y vs. Z starting from the marginal tables (**X** vs. Y), (**X** vs. Z) and the joint distribution of the **X** variables.

Frechet.bounds.cat(tab.x, tab.xy, tab.xz, print.f="tables", align.margins = FALSE, tol= 0.001, warn = TRUE)

`tab.x` |
A |

`tab.xy` |
A A single categorical Y variable is allowed. One or more categorical variables can be considered as When |

`tab.xz` |
A A single categorical Z variable is allowed. One or more categorical variables can be considered as When |

`print.f` |
A string: when |

`align.margins` |
Logical (default |

`tol` |
Tolerance used in comparing joint distributions as far as |

`warn` |
Logical, when |

This function permits to compute the expected conditional Frechet bounds for the relative frequencies in the contingency table of Y vs. Z, starting from the distributions P(Y|X), P(Z|X) and P(X). The expected conditional bounds for the relative frequencies *p(y=j,z=k)* in the table Y vs. Z are:

*p(Y=j,Z=k) >= sum_i(p(X=i) * max(0; p(Y=j|X=i) + p(Z=k|X=i) - 1) )*

*p(Y=j,Z=k) <= sum_i(p(X=i) * min(p(Y=j|X=i),p(Z=k|X=i))) *

The relative frequencies *p(X=i)=n_i/n* are computed from the frequencies in `tab.x`

;

the relative frequencies *p(Y=j|X=i)=n_ij/n_i.* are derived from `tab.xy`

,

finally, *p(Z=k|X=i)=n_ik/n_i.* are derived from `tab.xz`

.

Estimation requires that all the starting tables share the same marginal distribution of the **X** variables.

This function returns also the unconditional bounds for the relative frequencies in the contingency table of Y vs. Z, i.e. computed also without considering the **X** variables:

*max(0;p(Y=j)+p(Z=k)-1) <= p(Y=j,Z=k) <= min(p(Y=j);p(Z=k))*

These bounds represent the unique output when `tab.x = NULL`

.

Finally, the contingency table of Y vs. Z estimated under the Conditional Independence Assumption (CIA) is obtained by considering:

*p(Y=i,Z=k) = p(Y=j|X=i)*p(Z=k|X=i)*p(X=i)*

When `tab.x = NULL`

then it is also provided the expected table under the assumption of independence between Y and Z:

*p(Y=i,Z=k) = p(Y=j)*p(Z=k)**

The presence of too many cells with 0s in the input contingency tables is an indication of sparseness; this is an unappealing situation when estimating the cells' relative frequencies needed to derive the bounds; in such cases the corresponding results may be unreliable. A possible alternative way of working consists in estimating the required parameters by considering a pseudo-Bayes estimator (see `pBayes`

); in practice the input `tab.x`

, `tab.xy`

and `tab.xz`

should be the ones provided by the `pBayes`

function.

When `print.f="tables"`

(default) a list with the following components:

`low.u` |
The estimated lower bounds for the relative frequencies in the table Y vs. Z without conditioning on the |

`up.u` |
The estimated upper bounds for the relative frequencies in the table Y vs. Z without conditioning on the |

`CIA` |
The estimated relative frequencies in the table Y vs. Z under the Conditional Independence Assumption (CIA). |

`low.cx` |
The estimated lower bounds for the relative frequencies in the table Y vs. Z when conditioning on the |

`up.cx` |
The estimated upper bounds for the relative frequencies in the table Y vs. Z when conditioning on the |

`uncertainty` |
The uncertainty associated to input data, measured in terms of average width of uncertainty bounds with and without conditioning on the |

When `print.f="data.frame"`

the output list contains just two components:

`bounds` |
A data.frame whose columns reports the estimated uncertainty bounds. |

`uncertainty` |
The uncertainty associated to input data, measured in terms of average width of uncertainty bounds with and without conditioning on the |

Marcello D'Orazio mdo.statmatch@gmail.com

D'Orazio, M., Di Zio, M. and Scanu, M. (2006) “Statistical Matching for Categorical Data: Displaying Uncertainty and Using Logical Constraints”, *Journal of Official Statistics*, 22, pp. 137–157.

D'Orazio, M., Di Zio, M. and Scanu, M. (2006). *Statistical Matching: Theory and Practice.* Wiley, Chichester.

`Fbwidths.by.x`

, `harmonize.x`

data(quine, package="MASS") #loads quine from MASS str(quine) # split quine in two subsets suppressWarnings(RNGversion("3.5.0")) set.seed(7654) lab.A <- sample(nrow(quine), 70, replace=TRUE) quine.A <- quine[lab.A, 1:3] quine.B <- quine[-lab.A, 2:4] # compute the tables required by Frechet.bounds.cat() freq.xA <- xtabs(~Sex+Age, data=quine.A) freq.xB <- xtabs(~Sex+Age, data=quine.B) freq.xy <- xtabs(~Sex+Age+Eth, data=quine.A) freq.xz <- xtabs(~Sex+Age+Lrn, data=quine.B) # apply Frechet.bounds.cat() bounds.yz <- Frechet.bounds.cat(tab.x=freq.xA+freq.xB, tab.xy=freq.xy, tab.xz=freq.xz, print.f="data.frame") bounds.yz # harmonize distr. of Sex vs. Age during computations # in Frechet.bounds.cat() #compare marg. distribution of Xs in A and B vs. pooled estimate comp.prop(p1=margin.table(freq.xy,c(1,2)), p2=freq.xA+freq.xB, n1=nrow(quine.A), n2=nrow(quine.A)+nrow(quine.B), ref=TRUE) comp.prop(p1=margin.table(freq.xz,c(1,2)), p2=freq.xA+freq.xB, n1=nrow(quine.A), n2=nrow(quine.A)+nrow(quine.B), ref=TRUE) bounds.yz <- Frechet.bounds.cat(tab.x=freq.xA+freq.xB, tab.xy=freq.xy, tab.xz=freq.xz, print.f="data.frame", align.margins=TRUE) bounds.yz

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.