SUE.lm: The Subsampling Method for Linear Regression

Description Usage Arguments Value Author(s) Examples

View source: R/SUE.lm.R

Description

This function computes the subsampling method estimators for linear regression.

Usage

1
SUE.lm(formula, data = list(), k, ns, r, constant = 0.25, consistency.check = TRUE)

Arguments

formula

it is an object that indicates the variables used in the regression. a formula object has the form y = x1 +x2 +...+xp, where y is the name of the dependent variable, and x1, ..., xp are the names of the explanatory variables.

data

This argument is used only if the variables belong to a data frame, in which case data is the name of the data frame.

k

It is the total number of subsamples to be generated.

ns

It is the subsample size.

r

It is the number of subsamples to be combined. The function parameters is especially designed to compute these three parameters of subsampling method.

constant

is a predetermined parameter which is used to control the distance between two estimated values. It only works under the condition that consistency,check = "TURE". The default value is 0.25. However. users can try different values to get better result. What has to be mentioned is that: if the value is set too small, the function will fail the consistency check easily which result in running the program for many more times, but if the value is too large, the result may be not reliable. It is user's job to balance these situations.

consistency.check

The argument decides if we conduct consistency check. The defualt value is TURE. We highly recommend to always check the consistency of the result after computing. It can sufficiently increase the reliability of subsampling method.

Value

Apart from the same output components as the object of class "lm", such as coefficients, residuals and fitted.values, the main components of the output are:

combined.sample

is the final combined sample generated by the subsampling method, It is supposed to be the fine data without outliers.

sample.size

is the sample size of the combined sample, which is convenient for user to compute the number of outliers.

mse

They are MSEs of the regressions of r chosen subsamples.

beta

They are coefficient parameters of the regressions of r chosen subsamples.

check

It is a logistic output which indicates whether the subsampling method fails the consistency check or not.

Author(s)

Jim Yi

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
##We analysis the well known stackloss data by using ordinary linear method and the subsampling method. 
##We also try two m values, m = 2 and 4, which represent toughly 10% and 20% working 
##proportion of outliers in the data. The subsample size is chosen to be the default size of ns = 11.

data(stackloss)
a1=lm(stack.loss~Air.Flow+Water.Temp+Acid.Conc.,data=stackloss)
a2=SUE.lm(stack.loss~Air.Flow+Water.Temp+Acid.Conc.,data=stackloss,k=57,ns=11,r=6, 
	consistency.check=TRUE,constant=0.25)
a3=SUE.lm(stack.loss~Air.Flow+Water.Temp+Acid.Conc.,data=stackloss,k=327,ns=11,r=5, 
	consistency.check=TRUE,constant=0.25)
par(mfrow=c(2,2))
plot(a1$fitted.values,a1$residuals,xlab="(a) fitted values",ylab="residuals",ylim=c(-12,12))
abline(h=0)
abline(h=9.7,lty=2)
abline(h=-9.7,lty=2)
plot(SUE.fitted.values(a2),SUE.residuals(a2),xlab="(b) fitted values",ylab="residuals",ylim=c(-12,12))
abline(h=0)
abline(h=9,lty=2)
abline(h=-9,lty=2)
plot(SUE.fitted.values(a3),SUE.residuals(a3),xlab="(c) fitted values",ylab="residuals",ylim=c(-12,12))
abline(h=0)
abline(h=3.75,lty=2)
abline(h=-3.75,lty=2)

Example output



SUE documentation built on May 1, 2019, 9:15 p.m.