lmThresh: Finds and analyzes significance reversal regions for each...

Description Usage Arguments Details Value Author(s) Examples

View source: R/lmThresh.R

Description

This function finds (by iterating through a grid of values for each response) the approximate response value range(s) in which the regression is significant (when inside) or not (when outside), as defined by alpha. Here, two scenarios can be tested: i) if newobs = FALSE (default), the model's significance is tested by shifting y_i along the search grid. If newobs = TRUE, y_i is kept fixed and a new observation y_{2i} is added and shifted along the search grid. Hence, this function tests the regression for the sensitivity of being reversed in its significance through minor shifting of the original or added response values, as opposed to the effect of point removal (lmInfl).

Usage

1
2
3
lmThresh(model, factor = 5, alpha = 0.05, 
         method = c("pearson", "spearman"),
         steps = 10000, newobs = FALSE, ...) 

Arguments

model

the linear model of class lm.

factor

a factor for the initial search grid. See 'Details'.

alpha

the α-level to use as the threshold border.

method

select either parametric ("pearson") or rank-based ("spearman") statistics.

steps

the number of steps within the search range. See 'Details'.

newobs

logical. Should the significance region for each y_i be calculated from shifting y_i or from keeping y_i fixed and adding a new observation y2_i?

...

other arguments to future methods.

Details

In a first step, a grid is created with a range from y_i \pm \mathrm{factor} \cdot \mathrm{range}(y_{1...n}) with steps cuts. For each cut, the p-value is calculated for the model when y_i is shifted to that value (newobs = TRUE) or a second observation y_{2i} is added to the fixed y_i (newobs = TRUE). When the original model y = β_0 + β_1x + \varepsilon is significant (p < alpha), there are two boundaries that result in insignificance: one decreases the slope β_1 and the other inflates the standard error \mathrm{s.e.}(β_1) in a way that P_t(\frac{β_1}{\mathrm{s.e.}(β_1)}, n-2) > α. If the original model was insignificant, also two boundaries exists that either increase β_1 or reduce \mathrm{s.e.}(β_1). Often, no boundaries are found and increasing the factor grid range may alleviate this problem.

This function is quite fast (~ 300ms/10 response values), as the slope's p-value is calculated from the corr.test function of the 'psych' package, which utilizes matrix multiplication and vectorized pt calculation. The vector of correlation coefficients r_i from the cor function is transformed to t-values by

t_i = \frac{r_i√{n-2}}{√{1-r_i^2}}

which is equivalent to that employed in the linear regression's slope test.

Value

A list with the following items:

x

the predictor values.

y

the response values.

pmat

the p-value matrix, with length(x) columns and steps rows.

alpha

the selected α-level.

ySeq

the grid sequence for which the algorithm calculates p-values when y_i is shifted within.

model

the original lm model.

data

the original model.frame.

eosr

the y-values of the ends of the significance region.

diff

the Δ value between y_i and the nearest border of significance reversal.

closest

the (approx.) value of the nearest border of significance reversal.

newobs

should a new observation be added?

Author(s)

Andrej-Nikolai Spiess

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
## Significant model, no new observation.
set.seed(125)
a <- 1:20
b <- 5 + 0.08 * a + rnorm(length(a), 0, 1)
LM1 <- lm(b ~ a)
res1 <- lmThresh(LM1)
threshPlot(res1)
stability(res1)

## Insignificant model, no new observation.
set.seed(125)
a <- 1:20
b <- 5 + 0.08 * a + rnorm(length(a), 0, 2)
LM2 <- lm(b ~ a)
res2 <- lmThresh(LM2)
threshPlot(res2)
stability(res2)

## Significant model, new observation.
## Some significance reversal regions
## are within the prediction interval,
## e.g. 1 to 6 and 14 to 20.
set.seed(125)
a <- 1:20
b <- 5 + 0.08 * a + rnorm(length(a), 0, 1)
LM3 <- lm(b ~ a)
res3 <- lmThresh(LM3, newobs = TRUE)
threshPlot(res3)
stability(res3)

## More detailed example to the above:
## a (putative) new observation within the
## prediction interval may reverse significance.
set.seed(125)
a <- 1:20
b <- 5 + 0.08 * a + rnorm(length(a), 0, 1)
LM1 <- lm(b ~ a)
summary(LM1) # => p-value = 0.02688
res1 <- lmThresh(LM1, newobs = TRUE)
threshPlot(res1)
st <- stability(res1, pval = TRUE)
st$stats # => upper prediction boundary = 7.48
         # and eosr = 6.49
stabPlot(st, 1)
## reverse significance if we add a new response y_1 = 7
a <- c(1, a)
b <- c(7, b)
LM2 <- lm(b ~ a)
summary(LM2) # => p-value = 0.0767

reverseR documentation built on May 2, 2019, 10:59 a.m.