Description Usage Arguments Details Value Author(s) References See Also Examples

This function detects outliers using a user-specified method, and
fits a linear regression model with outliers removed. The object
returned by this function can be used for valid inference corrected
for outlier removal through generic functions like `summary`

,
`confint`

, `predict`

.

1 2 3 4 5 |

`formula, ` |
an object of class |

`data, ` |
an optional data frame, list or environment containing the variables in the model, the same
syntax as in |

`method, ` |
the outlier detection method, must be one of |

`cutoff, ` |
the cutoff of the outlier detection method. If |

`sigma, ` |
the noise level. Must be one of |

`x, ` |
an object of class |

`digits, ` |
the number of significant digits to use when printing. |

`..., ` |
other arguments. |

This function uses the same syntax as `lm`

for the `formula`

and `data`

arguments.
Users can access the original `"lm"`

objects through `$fit.full`

and `$fit.rm`

.
Common generic functions for `lm`

, including `coef`

, `confint`

,
`plot`

, `predict`

and `summary`

are re-written so that
they can be used to extract useful features of the object returned by this function.

Currently, this function supports three outlier detection methods. For `"cook"`

, the *i*-th
observation is considered as an outlier when its Cook's distance is greater than `cutoff/n`

,
where `n`

is the number of observations. For `"dffits"`

, the *i*-th observation is
considered as an outlier when the square of its DFFITS measure is greater than `cutoff*p/(n-p)`

,
where `p`

is the number of variables (including the intercept). The rule of thumb of `cutoff`

for both methods are 4, which is the default value when one sets `cutoff = NULL`

.
The outlier detection event of both methods can be characterized as a set of quadratic constraints
in the response *y*:

*\bigcap_{i \in [n]} {y^T Q_i y ≥ 0},*

and the constraint returned by this function is the list of *Q_i* matrices.
For `"lasso"`

, we assume the *mean-shift model*
*y = X β + u + ε*, where *u* is the "outlying coefficients" and
*ε ~ N(0, σ^2 I)* is the noise. We solve the following program:

*(\hat β, \hat u) = argmin ||y-Xβ-u||_2^2 + cutoff*||u||_1.*

The *i*-th observation
is considered as an outlier when *\hat u_i* differs from *0*. The default cutoff for
`"lasso"`

is *0.75*E[||X^T ε||_∞]/n*, which is a less conservative choice
than the prediction-optimal cutoff *2*E[||X^T ε||_∞]/n*. This cutoff is computed
by Monte Carlo simulation and *σ* is replaced by an estimate when the true noise level
is unknown. The outlier detection event of `"lasso"`

can be characterized as a
set of affine constraints in the response *y*:

*A y ≥ b, *

where the *"≥"* is interpreted as element-wise. The constraint returned by this function is
then a list of `(A, b)`

.

This function returns an object of `class`

`"outference"`

.

The function `summary`

is used to obtain and print a summary (including p-values)
of the results. The generic functions `coef`

, `confint`

, `plot`

,
`predict`

are used to extract useful features of the object returned by this function.

An object of class `"outference"`

is a list containing the following components:

`fit.full, ` |
an |

`fit.rm, ` |
an |

`method, ` |
the method used for outlier detection. |

`cutoff, ` |
the cutoff of the method. |

`outlier.det, ` |
indexes of detected outliers. |

`magnitude, ` |
a measure of "outlying-ness". For |

`constraint, ` |
the constraint in the response that characterizes the outlier detection event.
For |

`sigma, ` |
the noise level used in the fit. |

`call, ` |
the function call. |

Shuxiao Chen <[email protected]>

Lee, Jason D., et al. "Exact post-selection inference, with application to the lasso." The Annals of Statistics 44.3 (2016): 907-927.

S. Chen and J. Bien. “Valid Inference Corrected for Outlier Removal”. arXiv preprint arXiv:1711.10635 (2017).

`summary.outference`

for summaries;

`coef.outference`

for extracting coefficients;

`confint.outference`

for confidence intervals of regression coefficients;

`plot.outference`

for plotting the outlying measure;

`predict.outference`

for making predictions.

1 2 3 4 5 6 7 8 9 10 11 | ```
## Brownlee’s Stack Loss Plant Data
data("stackloss")
head("stackloss") # look at the dataset
## fit the model
## detect outlier using Cook's distance with cutoff = 4
fit <- outference(stack.loss ~ ., data = stackloss, method = "cook", cutoff = 4)
plot(fit) # plot the Cook's distance of each observation
## observation 21 is considered as an outlier with cutoff = 4
summary(fit$fit.full) # look at the fit with all the data
summary(fit$fit.rm) # look at the fit with observation 21 deleted
summary(fit) # extract the corrected p-values after outlier removal
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.