Fbounds.pred | R Documentation |
This function assesses the uncertainty in estimating the contingency table crossing y.rec
(Y) and z.don
(Z) when the two variables are observed in two different samples sharing a number of common predictors.
Fbounds.pred(data.rec, data.don,
match.vars, y.rec, z.don, pred = "multinom",
w.rec = NULL, w.don = NULL, type.pred = "random",
out.pred = FALSE, ...)
data.rec |
dataframe including the Xs (predictors, listed in |
data.don |
dataframe including the Xs (predictors, listed in |
match.vars |
vector with the names of the Xs variables to be used as predictors (or set in which select the best predictors with lasso) of respectively |
y.rec |
character indicating the name of Y target variable in |
z.don |
character indicating the name of Z target variable in |
pred |
character specifying the method used to obtain predictions of both Y and Z. Available methods include
|
w.rec |
name of the variable with the weights of the units in |
w.don |
name of the variable with the weights of the units in |
type.pred |
string specifying how to obtain the predictions of Y and Z. By default, the fitted models return conditional probabilities (or scores), then if |
out.pred |
Logical. If TRUE (default is FALSE) returns the input datasets with the estimated conditional probabilities (depending on |
... |
additional arguments, if needed. |
The function evaluates the uncertainty in estimating the contingency table crossing y.rec
(Y) and z.don
(Z) when the two variables are observed in two different samples related to the same target population, but the samples share a number of common predictors. The evaluation of the uncertainty is equivalent to estimating the bounds for each cell in the contingency table where Y and Z intersect; the bounds can be unconditional (Frechet property) or conditional on the predictions of both Y and Z provided by the models fitted according to the pred
argument. This latter way of working avoids many of the drawbacks of obtaining expectations of conditional bounds when conditioning on many X variables, and allows the inclusion of non-categorical predictors. The final estimation of the conditional bounds is provided by the function Frechet.bounds.cat
.
a list with the following components:
up.rec
only when out.pred = TRUE
it corresponds to a smaller version of data.rec
with the estimated conditional probabilities for both Y and Z (depending on pred
argument), the predicted class of Y (depending on type.pred
argument), the predicted class of Z (depending on type.pred
argument), the true observed class of Y and the predictors (argument match.vars
) (and the weights when w.rec
is specified).
up.don
only when out.pred = TRUE
it corresponds to a smaller version of data.don
with the estimated conditional probabilities for both Y and Z (depending on pred
argument), the predicted class of Y (depending on type.pred
argument), the predicted class of Z (depending on type.pred
argument), the true observed class of Z and the predictors (argument match.vars
) (and the weights when w.don
is specified).
p.xx.ini
the estimated relative frequencies in the table crossing predictions of Y and Z; it is estimated after pooling the samples (weighted average of estimates obtained on the separates samples);
p.xy.ini
the estimated table crossing Y and the predictions of both Y and Z estimated from data.rec
(weights are used if provided with the w.rec
argument);
p.xz.ini
the estimated table crossing Z and the predictions of both Y and Z estimated from data.don
(weights are used if provided with the w.don
argument);
accuracy
the estimated accuracy in predicting respectively Y and Z with the chosen method (argument pred
) and the available predictors (argument match.vars
);
bounds
a data.frame whose columns reports the estimated unconditional and conditional bounds for each cell in the contingency table crossing y.rec
(Y) and z.don
(Z);
uncertainty
the uncertainty associated to input data, measured in terms of average width of uncertainty bounds with and without conditioning on the predictions (for further details see Frechet.bounds.cat
.
Marcello D'Orazio mdo.statmatch@gmail.com
D'Orazio, M., (2024). Is Statistical Matching feasible? Note, https://www.researchgate.net/publication/387699016_Is_statistical_matching_feasible.
Frechet.bounds.cat
.
data(quine, package="MASS") #loads quine from MASS
str(quine)
# split quine in two subsets
set.seed(223344)
lab.A <- sample(nrow(quine), 70, replace=TRUE)
quine.A <- quine[lab.A, 1:3]
quine.B <- quine[-lab.A, 2:4]
# multinomial model and predictions with most-voted criterion
fbp <- Fbounds.pred(data.rec = quine.A, data.don = quine.B,
match.vars = c("Sex", "Age"),
y.rec = "Eth", z.don = "Lrn",
pred = "multinom", type.pred = "mostvoted")
fbp$p.xx.ini # estimated cross-tab of predictions
fbp$bounds # estimated conditional and unconditional bounds
fbp$uncertainty # estimated uncertainty about Y*Z
# multinomial model and predictions with randomized criterion
fbp <- Fbounds.pred(data.rec = quine.A, data.don = quine.B,
match.vars = c("Sex", "Age"),
y.rec = "Eth", z.don = "Lrn",
pred = "multinom", type.pred = "random")
fbp$p.xx.ini # estimated cross-tab of predictions
fbp$bounds # estimated conditional and unconditional bounds
fbp$uncertainty # estimated uncertainty about Y*Z
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.