| Fbounds.pred | R Documentation |
This function assesses the uncertainty in estimating the contingency table crossing y.rec (Y) and z.don (Z) when the two variables are observed in two different samples sharing a number of common predictors.
Fbounds.pred(data.rec, data.don,
match.vars, y.rec, z.don, pred = "multinom",
w.rec = NULL, w.don = NULL, type.pred = "random",
out.pred = FALSE, ...)
data.rec |
dataframe including the Xs (predictors, listed in |
data.don |
dataframe including the Xs (predictors, listed in |
match.vars |
vector with the names of the Xs variables to be used as predictors (or set in which select the best predictors with lasso) of respectively |
y.rec |
character indicating the name of Y target variable in |
z.don |
character indicating the name of Z target variable in |
pred |
character specifying the method used to obtain predictions of both Y and Z. Available methods include
|
w.rec |
name of the variable with the weights of the units in |
w.don |
name of the variable with the weights of the units in |
type.pred |
string specifying how to obtain the predictions of Y and Z. By default, the fitted models return conditional probabilities (or scores), then if |
out.pred |
Logical. If TRUE (default is FALSE) returns the input datasets with the estimated conditional probabilities (depending on |
... |
additional arguments, if needed. |
The function evaluates the uncertainty in estimating the contingency table crossing y.rec (Y) and z.don (Z) when the two variables are observed in two different samples related to the same target population, but the samples share a number of common predictors. The evaluation of the uncertainty is equivalent to estimating the bounds for each cell in the contingency table where Y and Z intersect; the bounds can be unconditional (Frechet property) or conditional on the predictions of both Y and Z provided by the models fitted according to the pred argument. This latter way of working avoids many of the drawbacks of obtaining expectations of conditional bounds when conditioning on many X variables, and allows the inclusion of non-categorical predictors. The final estimation of the conditional bounds is provided by the function Frechet.bounds.cat.
a list with the following components:
up.rec only when out.pred = TRUE it corresponds to a smaller version of data.rec with the estimated conditional probabilities for both Y and Z (depending on pred argument), the predicted class of Y (depending on type.pred argument), the predicted class of Z (depending on type.pred argument), the true observed class of Y and the predictors (argument match.vars) (and the weights when w.rec is specified).
up.don only when out.pred = TRUE it corresponds to a smaller version of data.don with the estimated conditional probabilities for both Y and Z (depending on pred argument), the predicted class of Y (depending on type.pred argument), the predicted class of Z (depending on type.pred argument), the true observed class of Z and the predictors (argument match.vars) (and the weights when w.don is specified).
p.xx.ini the estimated relative frequencies in the table crossing predictions of Y and Z; it is estimated after pooling the samples (weighted average of estimates obtained on the separates samples);
p.xy.ini the estimated table crossing Y and the predictions of both Y and Z estimated from data.rec (weights are used if provided with the w.rec argument);
p.xz.ini the estimated table crossing Z and the predictions of both Y and Z estimated from data.don (weights are used if provided with the w.don argument);
accuracy the estimated accuracy in predicting respectively Y and Z with the chosen method (argument pred) and the available predictors (argument match.vars);
bounds a data.frame whose columns reports the estimated unconditional and conditional bounds for each cell in the contingency table crossing y.rec(Y) and z.don (Z);
uncertainty the uncertainty associated to input data, measured in terms of average width of uncertainty bounds with and without conditioning on the predictions (for further details see Frechet.bounds.cat.
Marcello D'Orazio mdo.statmatch@gmail.com
D'Orazio, M., (2024). Is Statistical Matching feasible? Note, https://www.researchgate.net/publication/387699016_Is_statistical_matching_feasible.
Frechet.bounds.cat.
data(quine, package="MASS") #loads quine from MASS
str(quine)
# split quine in two subsets
set.seed(223344)
lab.A <- sample(nrow(quine), 70, replace=TRUE)
quine.A <- quine[lab.A, 1:3]
quine.B <- quine[-lab.A, 2:4]
# multinomial model and predictions with most-voted criterion
fbp <- Fbounds.pred(data.rec = quine.A, data.don = quine.B,
match.vars = c("Sex", "Age"),
y.rec = "Eth", z.don = "Lrn",
pred = "multinom", type.pred = "mostvoted")
fbp$p.xx.ini # estimated cross-tab of predictions
fbp$bounds # estimated conditional and unconditional bounds
fbp$uncertainty # estimated uncertainty about Y*Z
# multinomial model and predictions with randomized criterion
fbp <- Fbounds.pred(data.rec = quine.A, data.don = quine.B,
match.vars = c("Sex", "Age"),
y.rec = "Eth", z.don = "Lrn",
pred = "multinom", type.pred = "random")
fbp$p.xx.ini # estimated cross-tab of predictions
fbp$bounds # estimated conditional and unconditional bounds
fbp$uncertainty # estimated uncertainty about Y*Z
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.