R/simSynthData.R

Defines functions simSynthData

Documented in simSynthData

#' Plug-in Sampling Single Synthetic Dataset Generation
#'
#' This function is used to generate a single synthetic version of the original data via Plug-in Sampling.
#'
#' Assume that \eqn{\mathbf{X}=\left(\mathbf{x}_1, \dots, \mathbf{x}_n\right)} is the original data, assumed to be normally distributed,
#' we compute \eqn{\bar{\mathbf{x}}} as the sample mean and \eqn{\hat{\boldsymbol{\Sigma}}=\mathbf{S}/(n-1)} as the sample covariance matrix,
#' where \eqn{\mathbf{S}} is the sample Wishart matrix.
#' We generate \eqn{\mathbf{V}=\left(\mathbf{v}_1, \dots, \mathbf{v}_n\right)}, by drawing
#'
#' \deqn{\mathbf{v}_i\stackrel{i.i.d.}{\sim}N_p(\bar{\mathbf{x}},\hat{\boldsymbol{\Sigma}}).}
#'
#' @param X matrix or dataframe
#' @param n_imp sample size
#'
#' @return a matrix of generated dataset
#'
#' @references
#'  Klein, M., Moura, R. and Sinha, B. (2021). Multivariate Normal Inference based on Singly Imputed Synthetic Data under Plug-in Sampling. Sankhya B 83, 273–287.
#'
#' @importFrom MASS mvrnorm
#' @importFrom stats var
#' @examples
#' library(MASS)
#' n_sample = 1000
#' mu=c(0,0,0,0)
#' Sigma=diag(1,4,4)
#' # Create original simulated dataset
#' df_o = mvrnorm(n_sample, mu, Sigma)
#' # Create singly imputed synthetic dataset
#' df_s = simSynthData(df_o)
#' #Estimators synthetic
#' mean_s <- colMeans(df_s)
#' S_s <- (t(df_s)- mean_s) %*% t(t(df_s)- mean_s)
#' # careful about this computation
#' # mean_o is a column vector and if you are thinking as n X p matrices and
#' # row vectors you should be aware of this.
#' print(mean_s)
#' print(S_s/(dim(df_s)[1]-1))
#' @export

simSynthData <- function(X, n_imp = dim(X)[1]){
  X <- as.matrix(X)
  mean_X <- colMeans(X)
  MEAN_X<-matrix(mean_X,n_imp,dim(X)[2],byrow = TRUE)
  S_X <- t(X - MEAN_X) %*% (X - MEAN_X)
  V <- MASS::mvrnorm(n_imp, mean_X, S_X / (n_imp - 1))
  return (V)
}

Try the PSinference package in your browser

Any scripts or data that you put into this service are public.

PSinference documentation built on April 4, 2025, 2:08 a.m.