outference_seq: Fit a linear model with outliers detected SEQUENTIALLY

Description Usage Arguments Details Value Author(s) References See Also

Description

This function detects outliers by using Cook's distance sequentially, and fits a linear regression model with outliers removed. The object returned by this function can be used for valid inference corrected for outlier removal through generic functions like summary, confint, predict.

Usage

1
outference_seq(formula, data, sigma = NULL, numOfOutlier)

Arguments

formula,

an object of class "formula", the same syntax as in lm.

data,

an optional data frame, list or environment containing the variables in the model, the same syntax as in lm.

sigma,

the noise level. Must be one of NULL, "estimate", or a positive scaler value. If sigma = NULL, then the inference will assume the noise level is unknown; if sigma = "estimate", then the inference will base on an estimated noise level.

numOfOutlier,

the number of outliers to be detected.

Details

This function uses the same syntax as lm for the formula and data arguments. Users can access the original "lm" objects through $fit.full and $fit.rm. Common generic functions for lm, including coef, confint, plot, predict and summary are re-written so that they can be used to extract useful features of the object returned by this function.

The i-th observation is considered as an outlier when its Cook's distance rank among top k, where k is the user-specified number of outliers to be detected. The outlier detection event can be characterized as a set of quadratic constraints in the response y:

\bigcap_{i \in I} {y^T Q_i y ≥ 0},

where I is a finite index set, and the constraint returned by this function is the list of Q_i matrices.

Value

This function returns an object of class c("outference_seq", "outference").

The function summary is used to obtain and print a summary (including p-values) of the results. The generic functions coef, confint, plot, predict are used to extract useful features of the object returned by this function.

An object of class c("outference_seq", "outference") is a list containing the following components:

fit.full,

an "lm" object representing the fit using the full data (no outliers are removed).

fit.rm,

an "lm" object representing the fit using the data after outlier removal.

method,

"cook".

cutoff,

NULL.

numOfOutlier,

the number of outliers to be detected.

outlier.det,

indexes of detected outliers.

magnitude,

the vector of the Cook's distance for all observations

constraint,

the constraint in the response that characterizes the outlier detection event. A list of n by n matrices.

sigma,

the noise level used in the fit.

call,

the function call.

Author(s)

Shuxiao Chen <sc2667@cornell.edu>

References

S. Chen and J. Bien. “Valid Inference Corrected for Outlier Removal”. arXiv preprint arXiv:1711.10635 (2017).

See Also

summary.outference for summaries;

coef.outference for extracting coefficients;

confint.outference for confidence intervals of regression coefficients;

plot.outference for plotting the outlying measure;

predict.outference for making predictions.


shuxiaoc/outference documentation built on July 8, 2019, 8:30 p.m.