fulltle-methods: Methods for Function fulltle in Package 'MAINT.Data'

fulltle-methodsR Documentation

Methods for Function fulltle in Package ‘MAINT.Data’

Description

Performs maximum trimmed likelihood estimation by an exact algorithm (full enumeratiom of all k-trimmed subsets)

Usage

fulltle(Sdt, CovCase = 1:4, SelCrit = c("BIC", "AIC"), alpha =
                 0.75, use.correction = TRUE, getalpha = "TwoStep",
                 rawMD2Dist = c("ChiSq", "HardRockeAsF",
                 "HardRockeAdjF"), MD2Dist = c("ChiSq",
                 "CerioliBetaF"), eta = 0.025, multiCmpCor = c("never",
                 "always", "iterstep"), outlin = c("MidPandLogR",
                 "MidP", "LogR"), reweighted = TRUE, k2max=1e6, 
                 force = FALSE, ...)

Arguments

Sdt

An IData object representing interval-valued units.

CovCase

Configuration of the variance-covariance matrix: a set of integers between 1 and 4.

SelCrit

The model selection criterion.

alpha

Numeric parameter controlling the size of the subsets over which the trimmed likelihood is maximized; roughly alpha*nrow(Sdt) observations are used for computing the trimmed likelihood. Note that when argument ‘getalpha’ is set to “TwoStep” the final value of ‘alpha’ is estimated by a two-step procedure and the value of argument ‘alpha’ is only used to specify the size of the samples used in the first step. Allowed values are between 0.5 and 1.

use.correction

whether to use finite sample correction factors; defaults to TRUE.

getalpha

Argument specifying if the ‘alpha’ parameter (roughly the percentage of the sample used for computing the trimmed likelihood) should be estimated from the data, or if the value of the argument ‘alpha’ should be used instead. When set to “TwoStep”, ‘alpha’ is estimated by a two-step procedure with the value of argument ‘alpha’ specifying the size of the samples used in the first step. Otherwise, the value of argument ‘alpha’ is used directly.

rawMD2Dist

The assumed reference distribution of the raw MCD squared distances, which is used to find to cutoffs defining the observations kept in one-step reweighted MCD estimates. Alternatives are ‘ChiSq’, ‘HardRockeAsF’ and ‘HardRockeAdjF’, respectivelly for the usual Chi-square, and the asymptotic and adjusted scaled F distributions proposed by Hardin and Rocke (2005).

MD2Dist

The assumed reference distributions used to find cutoffs defining the observations assumed as outliers. Alternatives are “ChiSq” and “CerioliBetaF” respectivelly for the usual Chi-square, and the Beta and F distributions proposed by Cerioli (2010).

eta

Nominal size of the null hypothesis that a given observation is not an outlier. Defines the raw MCD Mahalanobis distances cutoff used to choose the observations kept in the reweightening step.

multiCmpCor

Whether a multicomparison correction of the nominal size (eta) for the outliers tests should be performed. Alternatives are: ‘never’ – ignoring the multicomparisons and testing all entities at the ‘eta’ nominal level. ‘always’ – testing all n entitites at 1.- (1.-‘eta’^(1/n)); and ‘iterstep’ – use the iterated rule proposed by Cerioli (2010), i.e., make an initial set of tests using the nominal size 1.- (1-‘eta’^(1/n)), and if no outliers are detected stop. Otherwise, make a second step testing for outliers at the ‘eta’ nominal level.

outlin

The type of outliers to be consideres. “MidPandLogR” if outliers may be present in both MidPpoints and LogRanges, “MidP” if outliers are only present in MidPpoints, or “LogR” if outliers are only present in LogRanges.

reweighted

should a (Re)weighted estimate of the covariance matrix be used in the computation of the trimmed likelihood or just a “raw” covariance estimate; default is (Re)weighting.

k2max

Maximal allowed l2-norm condition number for correlation matrices. Correlation matrices with condition number above k2max are considered to be numerically singular, leading to degenerate results.

force

A boolean flag indicating whether, for moderate or large data sets the algorithm should proceed anyway, regardless of an expected long excution time, due to exponential explosions in the number of different subsets that need to be avaluated by fulltle.

...

Further arguments to be passed to internal functions of ‘fulltle’.

Value

An object of class IdtE with the fulltle estimates, the value of the comparison criterion used to select the covariance configurations and the robust squared Mahalanobis distances.

References

Brito, P., Duarte Silva, A. P. (2012), Modelling Interval Data with Normal and Skew-Normal Distributions. Journal of Applied Statistics 39(1), 3–20.

Cerioli, A. (2010), Multivariate Outlier Detection with High-Breakdown Estimators. Journal of the American Statistical Association 105 (489), 147–156.

Duarte Silva, A.P., Filzmoser, P. and Brito, P. (2017), Outlier detection in interval data. Advances in Data Analysis and Classification, 1–38.

Hadi, A. S. and Luceno, A. (1997), Maximum trimmed likelihood estimators: a unified approach, examples, and algorithms. Computational Statistics and Data Analysis 25(3), 251–272.

Hardin, J. and Rocke, A. (2005), The Distribution of Robust Distances. Journal of Computational and Graphical Statistics 14, 910–927.

See Also

fasttle, getIdtOutl

Examples



## Not run: 

# Create an Interval-Data object containing the intervals for characteristics 
# of 27 cars models.

CarsIdt <- IData(Cars[1:8],VarNames=c("Price","EngineCapacity","TopSpeed","Acceleration"))

# Estimate parameters by the full trimmed maximum likelihood estimator, 
# using a two-step procedure to select the trimming parameter, a reweighed 
# MCD estimate, and the classical 97.5% chi-square quantile cut-offs.

CarsTE1 <- fulltle(CarsIdt)
cat("Cars data -- normal maximum trimmed likelihood estimation results:\n")
print(CarsTE1)
		
# Estimate parameters by the full trimmed maximum likelihood estimator, using
# the triming parameter that maximizes breakdown, and a reweighed MCD estimate
# based on the 97.5% quantiles of Hardin and Rocke adjusted F distributions.
		
CarsTE2 <- fulltle(CarsIdt,alpha=0.5,getalpha=FALSE,rawMD2Dist="HardRockeAdjF")
cat("Cars data -- normal maximum trimmed likelihood estimation results:\n")
print(CarsTE2)
		
# Estimate parameters by the full trimmed maximum likelihood estimator, using 
# a two-step procedure to select the trimming parameter, and a reweighed MCD estimate 
# based on Hardin and Rocke adjusted F distributions, 95% quantiles, and 
# the Cerioli Beta and F distributions together with his iterated procedure 
# to identify outliers in the first step.
		
CarsTE3 <- fulltle(CarsIdt,rawMD2Dist="HardRockeAdjF",eta=0.05,MD2Dist="CerioliBetaF",
multiCmpCor="iterstep")
cat("Cars data -- normal maximum trimmed likelihood estimation results:\n")
print(CarsTE3)


## End(Not run)


MAINT.Data documentation built on April 4, 2023, 9:09 a.m.