canodist | R Documentation |
This function calculates the empirical distribution of the pivotal random variable that can be used to perform inferential procedures for the regression of one subset of variables on the other based on the released Single Synthetic data generated under Plug-in Sampling, assuming that the original dataset is normally distributed.
canodist(part, nsample, pvariates, iterations)
part |
Number of variables in the first subset. |
nsample |
Sample size. |
pvariates |
Number of variables. |
iterations |
Number of iterations for simulating values from the distribution and finding the quantiles. Default is |
We define
T_4^\star|\boldsymbol{\Delta} =
\frac{(|\boldsymbol{S}^{\star}_{12}
(\boldsymbol{S}^{\star}_{22})^{-1}-\boldsymbol{\Delta})
\boldsymbol{S}^{\star}_{22}(\boldsymbol{S}^{\star}_{12})
(\boldsymbol{S}^{\star}_{22})^{-1}-\boldsymbol{\Delta})^\top|}
{|\boldsymbol{S}^{\star}_{11.2}|}
where \boldsymbol{S}^\star = \sum_{i=1}^n (v_i - \bar{v})(v_i - \bar{v})^{\top}
,
v_i
is the i
th observation of the synthetic dataset,
considering \boldsymbol{S}^\star
partitioned as
\boldsymbol{S}^{\star}=\left[\begin{array}{lll}
\boldsymbol{S}^{\star}_{11}& \boldsymbol{S}^{\star}_{12}\\
\boldsymbol{S}^{\star}_{21} & \boldsymbol{S}^{\star}_{22}
\end{array}\right].
For \Delta = \boldsymbol{\Sigma}_{12}\boldsymbol{\Sigma}_{22}^{-1}
,
where \boldsymbol{\Sigma}
is partitioned the same way as \boldsymbol{S}^{\star}
its distribution is stochastic equivalent to
\frac{|\boldsymbol{\Omega}_{12}\boldsymbol{\Omega}_{22}^{-1}
\boldsymbol{\Omega}_{21}|}{|\boldsymbol{\Omega}_{11}-\boldsymbol{\Omega}_{12}
\boldsymbol{\Omega}_{22}^{-1}\boldsymbol{\Omega}_{21}|}
where \boldsymbol{\Omega} \sim \mathcal{W}_p(n-1, \frac{\boldsymbol{W}}{n-1})
,
\boldsymbol{W} \sim \mathcal{W}_p(n-1, \mathbf{I}_p)
and
\boldsymbol{\Omega}
partitioned in the same way as
\boldsymbol{S}^{\star}
.
To test \mathcal{H}_0: \boldsymbol{\Delta} =\boldsymbol{\Delta}_0
, compute the value
of T_{4}^\star
, \widetilde{T_{4}^\star}
, with the observed
values and reject the null hypothesis if
\widetilde{T_{4}^\star}>t^\star_{4,1-\alpha}
for
\alpha
-significance level, where t^\star_{4,\gamma}
is the
\gamma
th percentile of T_4^\star
.
a vector of length iterations
that recorded the empirical distribution's values.
Klein, M., Moura, R. and Sinha, B. (2021). Multivariate Normal Inference based on Singly Imputed Synthetic Data under Plug-in Sampling. Sankhya B 83, 273–287.
# generate original data
library(MASS)
n_sample = 100
p = 4
mu <- c(1,2,3,4)
Sigma = matrix(c(1, 0.5, 0.1, 0.7,
0.5, 2, 0.4, 0.9,
0.1, 0.4, 3, 0.2,
0.7, 0.9, 0.2, 4), nr = 4, nc = 4, byrow = TRUE)
df = mvrnorm(n_sample, mu = mu, Sigma = Sigma)
# generate synthetic data
df_s = simSynthData(df)
#Decompose Sigma and Sstar
part = 2
Sigma_12 = partition(Sigma,nrows = part, ncol = part)[[2]]
Sigma_22 = partition(Sigma,nrows = part, ncol = part)[[4]]
Delta0 = Sigma_12 %*% solve(Sigma_22)
Sstar = cov(df_s)
Sstar_11 = partition(Sstar,nrows = part, ncol = part)[[1]]
Sstar_12 = partition(Sstar,nrows = part, ncol = part)[[2]]
Sstar_21 = partition(Sstar,nrows = part, ncol = part)[[3]]
Sstar_22 = partition(Sstar,nrows = part, ncol = part)[[4]]
DeltaEst = Sstar_12 %*% solve(Sstar_22)
Sstar11_2 = Sstar_11 - Sstar_12 %*% solve(Sstar_22) %*% Sstar_21
T4_obs = det((DeltaEst-Delta0)%*%Sstar_22%*%t(DeltaEst-Delta0))/det(Sstar11_2)
T4 <- canodist(part = part, nsample = n_sample, pvariates = p, iterations = 10000)
q95 <- quantile(T4, 0.95)
T4_obs > q95 #False means that we don't have statistical evidences to reject Delta0
print(T4_obs)
print(q95)
# When the observed value is smaller than the 95% quantile,
# we don't have statistical evidences to reject the Sphericity property.
#
# Note that the value is very close to zero
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.