Description Usage Arguments Value Author(s) Examples
This function computes the subsampling method estimators for linear regression.
1 |
formula |
it is an object that indicates the variables used in the regression. a formula object has the form y = x1 +x2 +...+xp, where y is the name of the dependent variable, and x1, ..., xp are the names of the explanatory variables. |
data |
This argument is used only if the variables belong to a data frame, in which case data is the name of the data frame. |
k |
It is the total number of subsamples to be generated. |
ns |
It is the subsample size. |
r |
It is the number of subsamples to be combined. The function |
constant |
is a predetermined parameter which is used to control the distance between two estimated values. It only works under the condition that consistency,check = "TURE". The default value is 0.25. However. users can try different values to get better result. What has to be mentioned is that: if the value is set too small, the function will fail the consistency check easily which result in running the program for many more times, but if the value is too large, the result may be not reliable. It is user's job to balance these situations. |
consistency.check |
The argument decides if we conduct consistency check. The defualt value is TURE. We highly recommend to always check the consistency of the result after computing. It can sufficiently increase the reliability of subsampling method. |
Apart from the same output components as the object of class "lm", such as coefficients
, residuals
and fitted.values
, the main components of the output are:
combined.sample |
is the final combined sample generated by the subsampling method, It is supposed to be the fine data without outliers. |
sample.size |
is the sample size of the combined sample, which is convenient for user to compute the number of outliers. |
mse |
They are MSEs of the regressions of r chosen subsamples. |
beta |
They are coefficient parameters of the regressions of r chosen subsamples. |
check |
It is a logistic output which indicates whether the subsampling method fails the consistency check or not. |
Jim Yi
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | ##We analysis the well known stackloss data by using ordinary linear method and the subsampling method.
##We also try two m values, m = 2 and 4, which represent toughly 10% and 20% working
##proportion of outliers in the data. The subsample size is chosen to be the default size of ns = 11.
data(stackloss)
a1=lm(stack.loss~Air.Flow+Water.Temp+Acid.Conc.,data=stackloss)
a2=SUE.lm(stack.loss~Air.Flow+Water.Temp+Acid.Conc.,data=stackloss,k=57,ns=11,r=6,
consistency.check=TRUE,constant=0.25)
a3=SUE.lm(stack.loss~Air.Flow+Water.Temp+Acid.Conc.,data=stackloss,k=327,ns=11,r=5,
consistency.check=TRUE,constant=0.25)
par(mfrow=c(2,2))
plot(a1$fitted.values,a1$residuals,xlab="(a) fitted values",ylab="residuals",ylim=c(-12,12))
abline(h=0)
abline(h=9.7,lty=2)
abline(h=-9.7,lty=2)
plot(SUE.fitted.values(a2),SUE.residuals(a2),xlab="(b) fitted values",ylab="residuals",ylim=c(-12,12))
abline(h=0)
abline(h=9,lty=2)
abline(h=-9,lty=2)
plot(SUE.fitted.values(a3),SUE.residuals(a3),xlab="(c) fitted values",ylab="residuals",ylim=c(-12,12))
abline(h=0)
abline(h=3.75,lty=2)
abline(h=-3.75,lty=2)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.