fsrbase: fsrbase: an automatic outlier detection procedure in linear...

View source: R/fsrbase.R

fsrbaseR Documentation

fsrbase: an automatic outlier detection procedure in linear regression

Description

An automatic outlier detection procedure in linear regression

Usage


fsrbase(x, ...) 

## S3 method for class 'formula'
fsrbase(formula, data, subset, weights, na.action,
       model = TRUE, x.ret = FALSE, y.ret = FALSE,
       contrasts = NULL, offset, ...)

## Default S3 method:
fsrbase(x, y, bsb, intercept = TRUE, 
        monitoring = FALSE, control, trace = FALSE,
        ...) 

Arguments

formula

a formula of the form y ~ x1 + x2 + ....

data

data frame from which variables specified in formula are to be taken.

subset

an optional vector specifying a subset of observations to be used in the fitting process.

weights

an optional vector of weights to be used in the fitting process. NOT USED YET.

na.action

a function which indicates what should happen when the data contain NAs. The default is set by the na.action setting of options, and is na.fail if that is unset. The “factory-fresh” default is na.omit. Another possible value is NULL, no action. Value na.exclude can be useful.

model, x.ret, y.ret

logicals indicating if the model frame, the model matrix and the response are to be returned, respectively.

contrasts

an optional list. See the contrasts.arg of model.matrix.default.

offset

this can be used to specify an a priori known component to be included in the linear predictor during fitting. An offset term can be included in the formula instead or as well, and if both are specified their sum is used.

x

Predictor variables. Matrix. Matrix of explanatory variables (also called 'regressors') of dimension n x (p-1) where p denotes the number of explanatory variables including the intercept. Rows of X represent observations, and columns represent variables. By default, there is a constant term in the model, unless you explicitly remove it using input option intercept=FALSE, so do not include a column of 1s in X. Missing values (NA's) and infinite values (Inf's) are allowed, since observations (rows) with missing or infinite values will automatically be excluded from the computations.

y

Response variable. Vector. Response variable, specified as a vector of length n, where n is the number of observations. Each entry in y is the response for the corresponding row of X. Missing values (NA's) and infinite values (Inf's) are allowed, since observations (rows) with missing or infinite values will automatically be excluded from the computations.

bsb

Initial subset - vector of indices. If bsb=0 (default) then the procedure starts with p units randomly chosen. If bsb is not 0 the search will start with m0=length(bsb).

intercept

Indicator for constant term. Scalar. If intercept=TRUE, a model with constant term will be fitted (default), else, no constant term will be included.

monitoring

wheather to perform monitoring for several quantities in each step of the forward search. Deafault is monitoring=FALSE.

control

A control object (S3) containing estimation options, as returned by FSR_control. Use the function FSR_control and see its help page. If the control object is supplied, the parameters from it will be used. If parameters are passed also in the invocation statement, they will override the corresponding elements of the control object.

trace

Whether to print intermediate results. Default is trace=FALSE.

...

Potential further optional arguments, see the help of the function FSR_control.

Value

Depending on the input parameter monitoring, one of the following objects will be returned:

  1. fsr.object

  2. fsreda.object

Author(s)

FSDA team

References

Riani, M., Atkinson A.C., Cerioli A. (2009). Finding an unknown number of multivariate outliers. Journal of the Royal Statistical Society Series B, Vol. 71, pp. 201-221.

Examples

    ## Not run: 

    n <- 200
    p <- 3
    
    X <- matrix(data=rnorm(n*p), nrow=n, ncol=p)
    y <- matrix(data=rnorm(n*1), nrow=n, ncol=1)
    (out = fsrbase(X, y))

    ## Now we use the formula interface:
    (out1 = fsrbase(y~X, control=FSR_control(plot=FALSE)))

    ## Or use the variables in a data frame
    (out2 = fsrbase(Y~., data=hbk, control=FSR_control(plot=FALSE)))

    ## let us compare to the LTS solution
    (out3 = ltsReg(Y~., data=hbk))
    
    ## Now compute the model without intercept
    (out4 = fsrbase(Y~.-1, data=hbk, control=FSR_control(plot=FALSE)))
    
    ## And compare again with the LTS solution
    (out5 = ltsReg(Y~.-1, data=hbk))

    ## using default (optional arguments)        
    (out6 = fsrbase(Y~.-1, data=hbk, control=FSR_control(plot=FALSE, nsamp=1500, h=50)))
    
## End(Not run)

fsdaR documentation built on March 31, 2023, 8:18 p.m.