rho.bounds | R Documentation |
This function assesses the uncertainty in estimating the Pearson's correlation coefficient between y.rec
(Y) and z.don
(Z) when the two variables are observed in two different samples sharing a number of common predictors.
rho.bounds(data.rec, data.don,
match.vars, y.rec, z.don,
w.rec = NULL, w.don = NULL)
data.rec |
dataframe including the Xs (predictors, listed in |
data.don |
dataframe including the Xs (predictors, listed in |
match.vars |
vector with the names of the Xs variables to be used, jointly with |
y.rec |
character indicating the name of Y target variable in |
z.don |
character indicating the name of Z target variable in |
w.rec |
name of the variable with units' weights in |
w.don |
name of the variable with units' weights in |
This function evaluates the uncertainty in the estimation of the Pearson's correlation coefficient between y.rec
(Y) and z.don
(Z), when the two variables are observed in two different samples that refer to the same target population, but that share a set of common predictors X (match.vars
). The evaluation of the uncertainty corresponds to the estimation of the bounds (lower and upper) of the correlation coefficient between Y and Z, given the available data. The method uses the expressions proposed by Rodgers and DeVol (1982). Note that the correlations between the X variables common to both samples (match.vars
) are estimated after pooling the samples. Factor variables, if present in match.vars
, are replaced by the corresponding dummies before estimating the correlation; this method suffers from a number of critical problems related to the estimation of biserial correlation and the underlying assumption of a Gaussian distribution. The correlation matrix between Y and Xs is estimated on data.rec
, while the correlation matrix between Z and Xs is estimated on data.don
; this way of working can in some cases give unreliable estimates due to problems with the samples (usually when they are not representative of the same target population).
A vector with three values: the estimated lower bound for Pearson's correlation coefficient between y.rec
(Y) and z.don
(Z); the estimated upper bound; and, the mid-point of the interval that corresponds to the estimate Pearson's correlation coefficient under the conditional independence assumption (i.e. the correlations between Y and Z is fully explained by the available X variables match.vars
).
Marcello D'Orazio mdo.statmatch@gmail.com
D'Orazio, M., (2024). Is Statistical Matching feasible? Note, https://www.researchgate.net/publication/387699016_Is_statistical_matching_feasible.
Rodgers, W.L. and DeVol E.B. (1982). An evaluation of statistical matching. Report Submitted to the Income Survey Development Program, Dept. of Health and Human Services, Institute for Social Reasearch, University of Michigan.
mixed.mtc
.
set.seed(11335577)
pos <- sample(x = 1:150, size = 60, replace = FALSE)
ir.A <- iris[pos, c(1:3, 5)]
ir.B <- iris[-pos, c(1:2, 4:5)]
intersect(colnames(ir.A), colnames(ir.B)) # shared Xs
# Xs without Species (factor)
out.1 <- rho.bounds(data.rec=ir.A, data.don=ir.B,
match.vars=c("Sepal.Length", "Sepal.Width"),
y.rec="Petal.Length", z.don="Petal.Width")
out.1
# Xs with Species (factor)
out.2 <- rho.bounds(data.rec=ir.A, data.don=ir.B,
match.vars=c("Sepal.Length", "Sepal.Width", "Species"),
y.rec="Petal.Length", z.don="Petal.Width")
out.2
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.