pred.y: Prediction of y variables
In SeleMix: Selective Editing via Mixture Models

pred.y

R Documentation

Prediction of y variables

Description

Provides predictions of y variables according to a Gaussian contamination model

Usage

    pred.y (y, x=NULL, B, sigma, lambda, w, model="LN", t.outl=0.5)

Arguments

`y`	matrix or data frame containing the response variables
`x`	optional matrix or data frame containing the error free covariates
`B`	matrix of regression coefficients
`sigma`	covariance matrix
`lambda`	variance inflation factor
`w`	proportion of erroneous data
`model`	data distribution: LN = lognormal(default), N=normal
`t.outl`	threshold value for posterior probabilities of identifying outliers (default=0.5)

Details

This function provides expected values of a set of variables (y1.p,y2.p,... ) according to a mixture of two regression models with Gaussian residuals (see ml.est). If no covariates are available (x variables), a two component Gaussian mixture is used. Expected values (predictions) are computed on the base of a set of parameters of appropriate dimensions (B, sigma, lambda,w) and (possibly) a matrix (or data frame) containing the error-free x variables.

Missing values in the x variables are not allowed. However, robust predictions of y variables are also provided when these variables are not observed. A vector of missing pattern (pattern) indicates which item is observed and which is missing.

For each unit in the data set the posterior probability of being erroneous (tau) is computed and a flag (outlier) is provided taking value 0 or 1 depending on whether tau is greater than the user specified threshold (t.outl).

Value

pred.y returns a data frame containing the following columns:

`y1.p`, `y2.p`, `...`	predicted values for y variables
`tau`	posterior probabilities of being contaminated
`outlier`	1 if the observation is classified as an outlier, 0 otherwise
`pattern`	non-response patterns for y variables: 0 = missing, 1 = present value

Author(s)

M. Teresa Buglielli <bugliell@istat.it>, Ugo Guarnera <guarnera@istat.it>

References

Buglielli, M.T., Di Zio, M., Guarnera, U. (2010) "Use of Contamination Models for Selective Editing", European Conference on Quality in Survey Statistics Q2010, Helsinki, 4-6 May 2010

Examples


# Parameter estimation with one contaminated variable and one covariate
  data(ex1.data)
# Parameters estimated applying ml.est to \code{ex1.data} 
  B1 <- as.matrix(c(-0.152, 1.215))
  sigma1 <- as.matrix(1.25)
  lambda1 <- 15.5
  w1 <- 0.0479

# Variable prediction
  ypred <- pred.y (y=ex1.data[,"Y1"],  x=ex1.data[,"X1"], B=B1,
          sigma=sigma1, lambda=lambda1, w=w1, model="LN", t.outl=0.5)
# Plot ypred vs Y1
  sel.pairs(cbind(ypred[,1,drop=FALSE],ex1.data[,"Y1",drop=FALSE]),
            outl=ypred[,"outlier"])

SeleMix documentation built on April 4, 2025, 12:38 a.m.