Inddist | R Documentation |
This function calculates the empirical distribution of the pivotal random variable that can be used to perform inferential procedures and test the independence of two subsets of variables based on the released Single Synthetic data generated under Plug-in Sampling, assuming that the original dataset is normally distributed.
Inddist(part, nsample, pvariates, iterations)
part |
Number of variables in the first subset. |
nsample |
Sample size. |
pvariates |
Number of variables. |
iterations |
Number of iterations for simulating values from the distribution and finding the quantiles. Default is |
We define
T_3^\star =
\frac{|\boldsymbol{S}^{\star}|}
{|\boldsymbol{S}^{\star}_{11}||\boldsymbol{S}^{\star}_{22}|}
where \boldsymbol{S}^\star = \sum_{i=1}^n (v_i - \bar{v})(v_i - \bar{v})^{\top}
,
v_i
is the i
th observation of the synthetic dataset,
considering \boldsymbol{S}^\star
partitioned as
\boldsymbol{S}^{\star}=\left[\begin{array}{lll}
\boldsymbol{S}^{\star}_{11}& \boldsymbol{S}^{\star}_{12}\\
\boldsymbol{S}^{\star}_{21} & \boldsymbol{S}^{\star}_{22}
\end{array}\right].
Under the assumption that \boldsymbol{\Sigma}_{12} = \boldsymbol{0}
,
its distribution is stochastic equivalent to
\frac{|\boldsymbol{\Omega}|}{|\boldsymbol{\Omega}_{11}||\boldsymbol{\Omega}_{22}|}
where \boldsymbol{\Omega} \sim \mathcal{W}_p(n-1, \frac{\boldsymbol{W}}{n-1})
,
\boldsymbol{W} \sim \mathcal{W}_p(n-1, \mathbf{I}_p)
and
\boldsymbol{\Omega}
partitioned in the same way as
\boldsymbol{S}^{\star}
.
To test \mathcal{H}_0: \boldsymbol{\Sigma}_{12} = \boldsymbol{0}
,
compute the value of T_{3}^\star
, \widetilde{T_{3}^\star}
,
with the observed values and reject the null hypothesis if
\widetilde{T_{3}^\star}<t^\star_{3,\alpha}
for
\alpha
-significance level, where t^\star_{3,\gamma}
is the
\gamma
th percentile of T_3^\star
.
a vector of length iterations
that recorded the empirical distribution's values.
Klein, M., Moura, R. and Sinha, B. (2021). Multivariate Normal Inference based on Singly Imputed Synthetic Data under Plug-in Sampling. Sankhya B 83, 273–287.
#generate original data with two independent subsets of variables
library(MASS)
n_sample = 100
p = 4
mu <- c(1,2,3,4)
Sigma = matrix(c(1, 0.5, 0, 0,
0.5, 2, 0, 0,
0, 0, 3, 0.2,
0, 0, 0.2, 4), nr = 4, nc = 4, byrow = TRUE)
df = mvrnorm(n_sample, mu = mu, Sigma = Sigma)
# generate synthetic data
df_s = simSynthData(df)
#Decompose Sstar in 4 parts
part = 2
Sstar = cov(df_s)
Sstar_11 = partition(Sstar,nrows = part, ncol = part)[[1]]
Sstar_12 = partition(Sstar,nrows = part, ncol = part)[[2]]
Sstar_21 = partition(Sstar,nrows = part, ncol = part)[[3]]
Sstar_22 = partition(Sstar,nrows = part, ncol = part)[[4]]
#Compute observed T3_star
T3_obs = det(Sstar)/(det(Sstar_11)*det(Sstar_22))
alpha = 0.05
# colect the quantile from the distribution assuming independence between the two subsets
T3 <- Inddist(part = part, nsample = n_sample, pvariates = p, iterations = 10000)
q5 <- quantile(T3, alpha)
T3_obs < q5 #False means that we don't have statistical evidences to reject independence
print(T3_obs)
print(q5)
# Note that the value of the observed T3_obs is close to one as expected
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.