modelFilter | R Documentation |
A set of fitted models (ddArray
) is filtered according
to a set of criteria that test for high AIC, high-influence points, and
plausibility of the tail probabilities of each fitted distribution.
modelFilter
will either auto-select the best model according to a set
of pre-defined, objective criteria or will will return all models that meet
a set of user-defined, or default criteria. A table of how the models
score according to each criterion is printed to the console.
modelFilter(dmod, sieve = "default", quiet = FALSE)
dmod |
a |
sieve |
a list of criteria for ordering models |
quiet |
boolean to suppress ( |
The criteria to test are entered in a list (sieve
) with components:
$rtail
= vector of probabilities that define a checkpoints on distributions
to avoid situations where a model that may fit well within the range of data
is nonetheless implausible because it predicts a significant or substantial
probability of carcasses falling great distances from the nearest turbine.
The default is to check whether or not a distribution predicts that less than
50% of carcasses fall within 80 meters, 90% within 120 meters, 95% within
150 meters, or 99% within 200 meters. Distributions that fall below any of
these points (for example predicting only 42% within 80 meters or only 74%
within 120 meters) fail the default rtail
test. The format of the
default for the test is $rtail = c(p80 = 0.5, p120 = 0.90,
p150 = 0.95, p200 = 0.99)
. Users may override the default by using, for example,
sieve = list(rtail = c(p80 = 0.8, p120 = 0.99, p150 = 0.99, p200 = 0.999))
in the argument list for a more stringent test or for a situation where
turbines are small or winds are light. Alternatively, users may forego the
test altogether by entering sieve = list(rtail = FALSE)
. If specific
probabilities are provided, they must be in a vector of length 4 with names
"p80
" etc. as in the examples above.
$ltail
= vector of probabilities that define checkpoints on distributions
to avoid situations where the search radius is short and a distribution that
fits the limited data set well but crashes to zero just outside the search
radius. The default is to check whether or not a distribution predicts that
greater than 50% of carcasses fall with 20 meters or 90% within 50 meters.
Distributions that pass above either of these checkpoints (for example
predicting 61% of carcasses within 20 meters or 93% within 50 meters)
are eliminated by the default ltail
test. The format of the default for
the test is $ltail = c(p20 = 0.5, p50 = 0.90)
. Users may override the
default by using, for example, sieve = list(rtail = c(p20 = 0.6, p50 = 0.8))
in the argument list for a situation where it is known that carcasses beyond
50 meters are common.
$aic
= a numeric scalar cutoff value for model's delta AICc
scores. Models with AICc scores exceeding the minimum AICc among all the
fitted models by sieve$aic
or more fail the test. The default value
is 10. Users may override the default by using, for example,
sieve = list(aic = 7)
in the argument list to use a delta AIC score
of 7 as the cutoff or may forego the test altogether by setting
sieve = list(aic = FALSE)
$hin
= TRUE
or FALSE
to test for high influence points,
the presence of which cast doubt on the reliability of the model. The function
defines "high influence" as models with high leverage points, namely, points
with \frac{h}{1 - h} > \frac{2p}{n - 2p}
(where h
is leverage, p
is the number of parameters in the model,
and n
is the search radius) with Cook's distance > 8/(n - 2*p)
.
The criteria for high influence points were adapted from Brian Ripley's GLM
diagnostics package boot
(glm.diag
). The test is
perhaps most valuable in identifying distributions with high probability of
carcasses landing well beyond what could reasonably be expected.
Several choices of pre-defined sieve
s are available (or, as described
above, users may define their own criteria):
sieve = "default"
The models are ordered by the following criteria:
extensibility
weight of right tail (discounting models that predict implausibly high proportions of carcasses beyond the search radius)
weight of the left tail (discounting models that predict implausibly high proportions of carcasses near the turbines)
AICc test (discounting models with delta AICc > 10)
high influence points (discounting models in which one or more of the
data points exert a high influence on the fitted model, according to
Ripley's GLM diagnostics package boot
(glm.diag
))
ranking by AICc
Precise definitions of the default sieve parameters are given in
sieve_default
.
sieve = NULL
Returns a list of the extensible models without scoring them by other model selection criteria.
sieve = "win"
Sorts models by high-influence points and AICc
sieve = list(<custom>)
User provides a custom sieve, which may
be a modification of the default sieve or de novo. To modify the default,
use, for example, sieve = list(hin = FALSE)
to disable the hin
test but keep the other default tests, or sieve = list(aic = 7)
to
use 7 rather than 10 as the AIC cutoff, or
sieve = list(ltail = c(p20 = 0.3, p50 = 0.8))
to use a more stringent
left tail test that requires CDF graphs to pass below the points (20, 0.3)
and (50, 0.8). Custom ltail
and rtail
parameters must match the
formats of the default tests, but their probabilities may vary. To turn off
the aic
filter, use sieve = list(aic = Inf)
. To turn off the
ltail
filter, use sieve = list(ltail = c(p20 = 1, p50 = 1))
.
To turn off the rtail
filter, use
sieve = list(rtail = c(p80 = 0, p120 = 0, p150 = 0, p200 = 0))
. These
custom components may be mixed and matched as desired.
An fmod
object, which is an unordered list of extensible models if
sieve = NULL
; otherwise, a list of class fmod
with following
components:
$filtered
the selected dd
object or a ddArray
list of
models that passed the tests
$scores
a matrix with all models tested (rownames = model names) and
the results of each test (columns aic_test
, rtail
,
ltail
, hin
, aic
)
$sieve
the test criteria, stored in a list with
$aic_test
= cutoff for AIC
$hin
= boolean to indicate whether high influence points were
considered
$rtail
= numeric vector giving the probabilities that the
right tail of the distribution must exceed at distances of 80, 120,
150, and 200 meters in order to pass
$ltail
= numeric vector giving the probabilities that the
left tail of the distribution must NOT exceed at distances of 20 and
50 meters in order to pass
models
a list (ddArray
object) of all models tested
note
notes on the tests
When a fmod
object is printed, only a small subset of the elements are
shown. To see a full list of the objects, use names(x)
, where x
is the name of the fmod
return value. The elements
can be extracted in the usual R way via, for example, x$sieve
or
x[["sieve"]]
.
data(layout_simple)
data(carcass_simple)
sitedata <- initLayout(layout_simple)
ringdata <- prepRing(sitedata)
ringsWithCarcasses <- addCarcass(carcass_simple, data_ring = ringdata)
distanceModels <- ddFit(ringsWithCarcasses)
stats(distanceModels)
stats(distanceModels[["tnormal"]])
stats(distanceModels[["lognormal"]])
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.