pred.y: Prediction of y variables

Description Usage Arguments Details Value Author(s) References Examples

Description

Provides predictions of y variables according to a Gaussian contamination model

Usage

1
    pred.y (y, x=NULL, B, sigma, lambda, w, model="LN", t.outl=0.5)

Arguments

y

matrix or data frame containing the response variables

x

optional matrix or data frame containing the error free covariates

B

matrix of regression coefficients

sigma

covariance matrix

lambda

variance inflation factor

w

proportion of erroneous data

model

data distribution: LN = lognormal(default), N=normal

t.outl

threshold value for posterior probabilities of identifying outliers (default=0.5)

Details

This function provides expected values of a set of variables (y1.p,y2.p,... ) according to a mixture of two regression models with Gaussian residuals (see ml.est). If no covariates are available (x variables), a two component Gaussian mixture is used. Expected values (predictions) are computed on the base of a set of parameters of appropriate dimensions (B, sigma, lambda,w) and (possibly) a matrix (or data frame) containing the error-free x variables.

Missing values in the x variables are not allowed. However, robust predictions of y variables are also provided when these variables are not observed. A vector of missing pattern (pattern) indicates which item is observed and which is missing.

For each unit in the data set the posterior probability of being erroneous (tau) is computed and a flag (outlier) is provided taking value 0 or 1 depending on whether tau is greater than the user specified threshold (t.outl).

Value

pred.y returns a data frame containing the following columns:

y1.p,y2.p,...

predicted values for y variables

tau

posterior probabilities of being contaminated

outlier

1 if the observation is classified as an outlier, 0 otherwise

pattern

non-response patterns for y variables: 0 = missing, 1 = present value

Author(s)

M. Teresa Buglielli <bugliell@istat.it>, Ugo Guarnera <guarnera@istat.it>

References

Buglielli, M.T., Di Zio, M., Guarnera, U. (2010) "Use of Contamination Models for Selective Editing", European Conference on Quality in Survey Statistics Q2010, Helsinki, 4-6 May 2010

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Parameter estimation with one contaminated variable and one covariate
  data(ex1.data)
# Parameters estimated applying ml.est to \code{ex1.data} 
  B1 <- as.matrix(c(-0.152, 1.215))
  sigma1 <- as.matrix(1.25)
  lambda1 <- 15.5
  w1 <- 0.0479

# Variable prediction
  ypred <- pred.y (y=ex1.data[,"Y1"],  x=ex1.data[,"X1"], B=B1,
          sigma=sigma1, lambda=lambda1, w=w1, model="LN", t.outl=0.5)
# Plot ypred vs Y1
  sel.pairs(cbind(ypred[,1,drop=FALSE],ex1.data[,"Y1",drop=FALSE]),
            outl=ypred[,"outlier"])

SeleMix documentation built on Nov. 29, 2020, 9:09 a.m.

Related to pred.y in SeleMix...