unmask: First core function used by End-User

Description Usage Arguments Details Value Author(s) References Examples

Description

Use the sample-moment-based density approximant method to estimate the density function of univariate distributions based noise multiplied data.

Usage

1
unmask(maskedVectorToBeUnmasked, noisefile)

Arguments

maskedVectorToBeUnmasked

masked data. The masked data were generated by R Function mask.

noisefile

Noise file containing a sample of the noise used to mask maskedVectorToBeUnmasked from R function mask

Details

unmask is fully described in Lin and Fielding (2015). The theory used to support unmask can be found in Lin (2014). unmask implements the sample-moment-based density approximate method the estimated the smoothed density function of the original data based on their make data maskedVectorToBeUnmasked. The output of the function unmask is a set of sample data from the estimated mouthed density function. The size of the output is the same as that of the original data that were masked by the multiplicative noise and yielded maskedVectorToBeUnmasked.

Value

Returns a list with four elements.

unmaskedVariable

vector of unmasked data

outMeanOfNoise

sample mean of the noise

outMeanOfSquaredNoise

sample mean of the squared noise

prob

vector mass function returned if the original data are categorical

Author(s)

Yan-Xia Lin

References

Lin, Yan-Xia (2014). Density approximant based on noise multiplied data. In J. Domingo-Ferrer (Eds.), Privacy in Statistical Databases 2014, LNCS 8744, Springer International Publishing Switzerland, 2014, pp. 89-104. Lin, Yan-Xia and Fielding, Mark James (2015). MaskDensity14: An R Package for the Density Approximant of a Univariate Based on Noise Multiplied Data, SoftwareX 34, 3743, doi:10.1016/j.softx.2015.11.002

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
##---- Should be DIRECTLY executable !! ----
##-- ==>  Define data, use random,
##--	or do  help(data=index)  for the standard data sets.

#Example 1:
set.seed(123)
n=10000

y <- rmulti(n=10000, mean=c(30, 50), sd=c(4,2), p=c(0.3, 0.7))
      # y is a sample drawn from Y.
noise<-rmulti(n=10000, mean=c(80, 100), sd=c(5,3), p=c(0.6, 0.4))
      # noise is a sample drawn from C.


a1<-runif(1, min=min(y)-2,max=min(y))
b1<-runif(1, min=max(y), max=max(y)+2)
ymask<-mask(vectorToBeMasked=y, noisefile=file.path(tempdir(),"noise.bin"), noise,
lowerBoundAsGivenByProvider=a1, upperBoundAsGivenByProvider=b1)
write(ymask$ystar, file.path(tempdir(),"ystar.dat")) # Create masked data and noise.bin.
         # The two files can be issued to the public. 
                                

      # After received the two files "ystar.dat" and
      # noise.bin, the data user can use the following code to 
      # obtain the synthetic data of the original data. 

ystar <- scan(file.path(tempdir(),"ystar.dat"))
y1 <- unmask(maskedVectorToBeUnmasked=ystar, noisefile=file.path(tempdir(),"noise.bin"))
sample<-y1$unmaskedVariable
   # y1$unmaskedVariable gives the  synthetic data of the
   # original data y.  The size of the synthetic data is  the
   # same as that of y
plot(density(y1$unmaskedVariable), main="density(ymask)", xlab="y")
   # the plot of the approximant of $f_Y$

#Example 2:

set.seed(124)
n<-2000
a<-170
b<-80
y<-rbinom(n, 1, 0.1)+1
noise<-(a+b)/2+ sqrt(1+(a-b)^2/4)*rnorm(n, 0,1)
noise[noise<0]<- - noise[noise<0]

ymask<-mask(vectorToBeMasked=factor(y), noisefile=file.path(tempdir(),"noise.bin"), noise,
lowerBoundAsGivenByProvider=0,upperBoundAsGivenByProvider=3)
      # using factor(y) because y is a categorical variable
write(ymask$ystar, file.path(tempdir(),"ystar.dat"))

ystar<-scan(file.path(tempdir(),"ystar.dat"))
y1 <- unmask(maskedVectorToBeUnmasked=ystar, noisefile=file.path(tempdir(),"noise.bin"))
unmaskY<-y1$unmaskedVariable  # synthetic data
mass_function<-y1$prob  # estimated mass function

MaskJointDensity documentation built on May 2, 2019, 8:28 a.m.