Description Usage Arguments Details Value Author(s) References Examples

Provides ML estimates of a Gaussian contamination model.

1 2 3 |

`y` |
matrix or data frame containing the response variables |

`x` |
optional matrix or data frame containing the error free covariates |

`model` |
data distribution: LN = lognormal(default), N=normal |

`lambda` |
starting value for the variance inflation factor (default=3) |

`w` |
starting value for the proportion of contaminated data (default=0.05) |

`lambda.fix` |
logical. TRUE if lambda is known |

`w.fix` |
logical. TRUE if w is known |

`eps` |
epsilon : tolerance parameter for the log-likelihood convergence (default=1e-7) |

`max.iter` |
maximum number of EM iterations (default=500) |

`t.outl` |
threshold value for posterior probabilities of identifying outliers (default=0.5) |

`graph` |
logical. TRUE to display graphics (default=FALSE) |

This function provides the parameter estimates of a contamination model where a set of `y`

variables is assumed to depend on a (possibly empty) set of covariates (`x`

variables)
through a mixture of two linear regressions with Gaussian residuals. The covariance matrices
of the two mixture components are assumed to be proportional (the proportionality constant being
`lambda`

). In case of no `x`

variables a mixture of two Gaussian distribution is estimated.
BIC and AIC scores (`bic.aic`

) are returned corresponding to both standard Gaussian model
and contamination model in order to help the user to avoid possible over-parametrisation.

According to the estimated model parameters, a matrix of predictions of ‘true’ `y`

values
(`ypred`

) is computed. To each unit in the dataset, a flag (`outlier`

) is assigned taking
value 0 or 1 depending on whether the posterior probability of being erroneous (`tau`

) is
greater than the user specified threshold (`t.outl`

).

The model is estimated using complete observations. Missing values in the `x`

variables are
not allowed. However, `y`

variables can be partly observed. Robust predictions of `y`

variables
are provided even when they are not observed.
A vector of missing pattern (`pattern`

) indicates which item is observed and which is missing.

In case the option ‘model = LN’ is specified, each zero value is changed in 1e-7 and a warning is returned.

In order to graphically monitor EM algorithm, a scatter plot is showed where outliers
are depicted as long as they are identified. The trajectory of the `lambda`

parameter
is also showed until convergence.

`ml.est`

returns a list containing the following components:

`ypred ` |
matrix of predicted values for y variables |

`B ` |
matrix of estimated regression coefficients |

`sigma ` |
estimated covariance matrix |

`lambda ` |
estimated variance inflation factor |

`w ` |
estimated proportion of erroneous data |

`tau ` |
vector of posterior probabilities of being contaminated |

`outlier ` |
1 if the observation is classified as an outlier, 0 otherwise |

`n.outlier ` |
total of outlier observations |

`pattern ` |
vector of non-response patterns for y variables: 0 = missing, 1 = present value |

`is.conv ` |
logical value: TRUE if the EM algorithm has converged |

`n.iter ` |
number of iterations of EM algorithm |

`sing ` |
if TRUE iteration are stopped because there is an almost perfect fit |

`bic.aic ` |
Bayesian Information Criterion and Akaike Information Criterion for contaminated and non contaminated Gaussian models |

M. Teresa Buglielli <bugliell@istat.it>, Ugo Guarnera <guarnera@istat.it>

Di Zio, M., Guarnera, U. (2013) *"A Contamination Model for Selective Editing"*,
Journal of Official Statistics. Volume 29, Issue 4, Pages 539-555
(https://doi.org/10.2478/jos-2013-0039).

Buglielli, M.T., Di Zio, M., Guarnera, U. (2010) *"Use of Contamination Models for Selective Editing"*,
European Conference on Quality in Survey Statistics Q2010, Helsinki, 4-6 May 2010

1 2 3 4 5 6 7 8 9 10 11 12 | ```
# Parameter estimation with one contaminated variable and one covariate
data(ex1.data)
ml.par <- ml.est(y=ex1.data[,"Y1"], x=ex1.data[,"X1"], graph=TRUE)
str(ml.par)
sum(ml.par$outlier) # number of outliers
# Parameter estimation with two contaminated variables and no covariates
## Not run:
data(ex2.data)
par.joint <- ml.est(y=ex2.data, x=NULL, graph=TRUE)
sum(par.joint$outlier) # number of outliers
## End(Not run)
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.