OutlierRegression: Finding outliers of a sigle variable (y) by a regression...

View source: R/OutlierRegression.R

OutlierRegressionR Documentation

Finding outliers of a sigle variable (y) by a regression model using a single explanatory variable (x).

Description

outliers are found by using a limit for studentized residuals.

Usage

OutlierRegression(
  data,
  idName = names(data)[1],
  strataName = NULL,
  xName = names(data)[3],
  yName = names(data)[4],
  method = "ordinary",
  limitModel = 2.5,
  limitIterate = 4.5
)

OutlierRegressionMicro(...)

OutlierRegressionTall(..., iD = TalliD())

OutlierRegressionWide(
  ...,
  addName = WideAddName(),
  sep = WideSep(),
  idNames = c("", "strata", ""),
  addLast = FALSE
)

Arguments

data

Input data set of class data.frame

idName

Name of id-variable(s)

strataName

Name of starta-variable. Single strata when NULL (default)

xName

Name of x-variable

yName

Name of y-variable

method

The method (model and weight) coded as a string: "ordinary" (default), "ratio", "noconstant", "mean" or "ratioconstant".

limitModel

Studentized residuals limit. Above limit -> outlier.

limitIterate

Studentized residuals limit for iterative calculation of studentized residuals.

Details

This function is related to ImputeRegression and the structure and the names of output are very similar. Note that missing values of x are allowed here.

Value

Output of OutlierRegression is a list of two data frames. The micro data frame has as many rows as input and aggregates data frame has one row for each strata. The individual variables are:

micro consists of the following elements:

id

id from input

x

The input x variable

y

The input y variable

strata

The input strata variable (can be NULL)

outlier

Dummy variable: outlier (1) or not (0).

category123

The three imputation groups: representative (1), correct but not representative (2), wrong (3).

yHat

Fitted values

rStud

The studentized residuals from last iteration

dffits

The DFFITS statistic from last iteration

hii

The leverages (diagonal elements of hat matrix) from last iteration

leaveOutResid

The outside-model residual from last iteration

limLo

-limitModel

limUp

limitModel

aggregates consists of the following elements:

N

Number of observations in each strata

coef

The final first model coefficient

coefB

The final second model coefficient or zeros when only one coefficient in model.

nModel

The final number of observations in model.

sigmaHat

The final square root of the estimated variance parameter

Output of OutlierRegressionMicro is the single data frame micro above.

Output of OutlierRegressionTall and OutlierRegressionWide are similiar to the functions in ImputeRegression.

Author(s)

Øyvind Langsrud

Examples


z = cbind(id=1:34,KostraData("ratioTest")[,c(3,1,2)])
OutlierRegression(z,strataName="k")
OutlierRegressionMicro(z,strataName="k")
OutlierRegressionTall(z,strataName="k")
OutlierRegressionWide(z,strataName="k")

rateData <- KostraData("rateData")               # Real Kostra data set
w <- rateData$data[, c(17,19,16,5)]              # Data with id, strata, x and y
w <- w[is.finite(w[,"Ny.kostragruppe"]), ]       # Remove Longyearbyen
w[w[,"Ny.kostragruppe"]>13,"Ny.kostragruppe"]=13 # Combine small strata
OutlierRegression(w, strataName = names(w)[2], method="ratio")


statisticsnorway/Kostra documentation built on Sept. 25, 2024, 10:37 a.m.