drop.gvf.points: Drop Outliers and Refit a GVF Model

View source: R/gvf.R

drop.gvf.pointsR Documentation

Drop Outliers and Refit a GVF Model

Description

This function drops observations (alleged outliers) from a fitted GVF model and simultaneously re-fits the model.

Usage

drop.gvf.points(x, method = c("pick", "cut"), which.plot = 1:2,
                res.type = c("standard", "student"), res.cut = 3,
                id.n = 3, labels.id = NULL,
                cex.id = 0.75, label.pos = c(4, 2),
                cex.caption = 1, col = NULL, drop.col = "red",
                ...)

Arguments

x

An object containing a single fitted GVF model (i.e. of class gvf.fit or gvf.fit.gr).

method

character specifying the method for identifying observations to be dropped (see ‘Details’); it may be either 'pick' (the default) or 'cut'.

which.plot

integer controlling the nature of the plot(s) that are used to identify and/or visualize the observations to be dropped: 1 means ‘Observed vs Fitted’, 2 means ‘Residuals vs Fitted’ (see ‘Details’).

res.type

character specifying what kind of residuals must be used.

res.cut

A positive value: observations to be dropped will be those with residuals whose absolute value exceeds 'res.cut'. Only meaningful if method is 'cut'.

id.n

Number of points to be initially labelled in each plot, starting with the most extreme. Only meaningful if method is 'pick'.

labels.id

Vector of labels, from which the labels for extreme points will be chosen. NULL uses observation numbers.

cex.id

Magnification of point labels.

label.pos

Positioning of labels, for the left half and right half of the graph(s) respectively.

cex.caption

Controls the size of caption.

col

Color to be used for the points in the plot(s).

drop.col

Color to be used to visualize and annotate the points to be dropped in the plot(s).

...

Other parameters to be passed through to plotting functions.

Details

This function drops observations (alleged outliers) from a single fitted GVF model and simultaneously re-fits the model. As a side effect, the function prints on screen the induced change for selected quality measures (see, e.g., getR2).

If method = "pick", observations to be dropped are identified interactively by clicking on points of a plot (see ‘Note’). Argument which.plot determines the nature of the plot: value 1 is for ‘Observed vs Fitted’, value 2 is for ‘Residuals vs Fitted’. In the latter case, argument res.type specifies what kind of residuals have to be plotted. Argument id.n specifies how many points have to be labelled initially, starting with the most extreme in terms of the selected residuals: this applies to both kinds of plots.

If method = "cut", observations to be dropped are those with residuals whose absolute value exceeds the value of argument res.cut. Again, argument res.type specifies what kind of residuals have to be used (and plotted). The points which have been cut will be highlighted on a plot, whose nature is again specified by argument which.plot. If which.plot = 1:2, dropped points will be visualized on both the ‘Observed vs Fitted’ and the ‘Residuals vs Fitted’ graphs simultaneously.

Argument drop.col controls the color to be used to visualize and annotate in the plot(s) the points to be dropped. All the other arguments have the same meaning as in function plot.lm.

Value

An object of the same class as x (i.e. either gvf.fit or gvf.fit.gr), containing the original GVF model re-fitted after dropping (alleged) outliers.

Note

For method = "pick", function drop.gvf.points is only supported on those screen devices for which function identify is supported. The identification process can be terminated either by right-clicking the mouse and selecting 'Stop' from the menu, or from the 'Stop' menu on the graphics window.

Author(s)

Diego Zardetto

See Also

GVF.db to manage ReGenesees archive of registered GVF models, gvf.input and svystat to prepare the input for GVF model fitting, fit.gvf to fit GVF models, plot.gvf.fit to get diagnostic plots for fitted GVF models, and predictCV to predict CV values via fitted GVF models.

Examples

# Load example data:
data(AF.gvf)

# Inspect available estimates and errors of counts:
str(ee.AF)

# List available registered GVF models:
GVF.db

# Fit example data to registered GVF model number one:
m <- fit.gvf(ee.AF, model=1)
m
summary(m)

##############################################################
# Method 'pick': identify outlier observations to be dropped #
# interactively by clicking on points of a plot.             #
##############################################################
  # Using the 'Observed vs Fitted' plot (the default):
## Not run: 
m1 <- drop.gvf.points(m)
m1
summary(m1)

## End(Not run)

 # Using the 'Residuals vs Fitted' plot with standardized
 # residuals (the default) and increasing id.n to get more
 # labelled points to guide your choices:
## Not run: 
m1 <- drop.gvf.points(m, which.plot = 2, id.n = 10)
m1
summary(m1)

## End(Not run)

 # The same as above, but with studentized residuals and
 # playing with colors:
## Not run: 
m1 <- drop.gvf.points(m, which.plot = 2, id.n = 10, res.type = "student",
                      col = "blue", drop.col = "green", pch = 20)
m1
summary(m1)

## End(Not run)


#############################################################
# Method 'cut': identify outlier observations to be dropped #
# by specifying a threshold for the absolute values of the  #
# residuals.                                                #
#############################################################
 # Using default threshold on standardized residuals and visualizing
 # dropped observations on both 'Observed vs Fitted' and 'Residuals
 # vs Fitted' plots:
m1 <- drop.gvf.points(m, method ="cut")
m1
summary(m1)

 # Using a custom threshold on studentized residuals and visualizing
 # dropped observations on the 'Observed vs Fitted' plot:
m1 <- drop.gvf.points(m, method ="cut",  res.type = "student",
                      res.cut = 2.5, which.plot = 1)
m1
summary(m1)

 # The same as above, but visualizing dropped observations on the
 # 'Residuals vs Fitted' plot:
m1 <- drop.gvf.points(m, method ="cut",  res.type = "student",
                      res.cut = 2.5, which.plot = 2)
m1
summary(m1)

 # You can obviously "cut"/"pick" alleged outliers again from an already
 # "cut"/"picked" fitted GVF model:
m2 <- drop.gvf.points(m1, method ="cut",  res.type = "student",
                      res.cut = 2.5, col = "blue", pch = 20) 
m2
summary(m2)


#################################################################
# Identifying outlier observations to be dropped from "grouped" #
# GVF fitted models (i.e. x has class 'gvf.fit.gr').            #
#################################################################
 # Recall we have at our disposal the following survey design object 
 # defined on household data:
exdes

 # Now use function svystat to prepare "grouped" estimates and errors
 # of counts to be fitted separately (here groups are regions):
ee <- svystat(exdes, y=~ind, by=~age5c:marstat:sex, combo=3, group=~regcod)
ee
plot(ee)

 # Fit registered GVF model number one separately inside groups:
m <- fit.gvf(ee, model=1)
m
summary(m)

 # Now drop alleged outliers separately inside groups:

   #####################################################
   # Method 'pick': work interactively group by group. #
   #####################################################
## Not run: 
   m1 <- drop.gvf.points(m, which.plot = 2, res.type = "student", col = "blue",
                         pch = 20)
   m1
   summary(m1)

## End(Not run)

   #########################################################
   # Method 'cut': apply the same threshold to all groups. #
   #########################################################
   m1 <- drop.gvf.points(m, method ="cut",  res.type = "student", res.cut = 2)
   m1
   summary(m1)


DiegoZardetto/ReGenesees documentation built on Dec. 16, 2024, 2:03 p.m.