Description Usage Arguments Details Value Author(s) References Examples

This function calculates leave-one-out (LOO) *p*-values for all data points and identifies those resulting in "significance reversal", i.e. in the *p*-value of the model's slope traversing the user-defined *α*-level.

1 |

`model` |
the linear model of class |

`alpha` |
the |

`method` |
select either parametric ( |

`verbose` |
logical. If |

`...` |
other arguments to |

The algorithm

1) calculates the *p*-value of the full model (all points),

2) calculates a LOO-*p*-value for each point removed,

3) checks for significance reversal in all data points and

4) returns all models as well as classical `influence.measures`

with LOO-*p*-values, *Δ**p*-values, slopes and standard errors attached.

If `method = "spearman"`

, *p*-values are based on Spearman Rank correlation, and the values given in the last column of the result matrix are Spearman's *ρ*.

The idea of *p*-value influencers was first introduced by Belsley, Kuh & Welsch, and described as an influence measure pertaining directly to the change in *t*-statistics, that will "show whether the conclusions of hypothesis testing would be affected", termed **dfstat** in [1, 2, 3] or **dfstud** in [4]:

*\rm{dfstat}_{ij} \equiv \frac{\hat{β}_j}{s√{(X'X)^{-1}_{jj}}}-\frac{\hat{β}_{j(i)}}{s_{(i)}√{(X'_{(i)}X_{(i)})^{-1}_{jj}}}*

where *\hat{β}_j* is the *j*-th estimate, *s* is the residual standard error, *X* is the design matrix and (*i*) denotes the *i*-th observation deleted.

**dfstat**, which for the regression's slope *β_1* is the difference of *t*-statistics

*Δ t = t_{β1} - t_{β1(i)} = \frac{β_1}{\rm{s.e.(β_1)}} - \frac{β_1(i)}{\rm{s.e.(β_1(i)})}*

is inextricably linked to the changes in *p*-value *Δ p*, calculated from

*Δ p = p_{β1} - p_{β1(i)} = 2≤ft(1-P_t(t_{β1}, ν)\right) - 2≤ft(1-P_t(t_{β1(i)} , ν-1)\right)*

where *P_t* is the Student's *t* cumulative distribution function with *ν* degrees of freedom, and where significance reversal is attained when *α \in [p_{β1}, p_{β1(i)}]*.
Interestingly, in linear regression the seemingly mandatory check of the influence of single data points on statistical inference is living in oblivion: apart from [1-4], there is, to the best of our knowledge, no reference to **dfstat** or *Δ p* in current literature on influence measures.

The influence output also includes the more recent Hadi's measure (column "hadi"):

*H_i = \frac{p_{ii}}{1 - p_{ii}} + \frac{k}{1 - p_{ii}}\frac{d_i^2}{(1-d_i^2)}*

where *p_{ii}* are the diagonals of the hat matrix (leverages), *k = 2* in univariate linear regression and *d_i = e_i/√{\rm{SSE}}*.

A list with the following items:

`origModel` |
the original model with all data points. |

`finalModels` |
a list of final models with the influencer(s) removed. |

`infl` |
a matrix with the original data, classical |

`sel` |
a vector with the influencers' indices. |

`alpha` |
the selected |

`origP` |
the original model's |

`stab` |
the stability measure, see |

Andrej-Nikolai Spiess

**For dfstat / dfstud :**

1. Regression diagnostics: Identifying influential data and sources of collinearity.

Belsley DA, Kuh E, Welsch RE.

John Wiley, New York, USA (2004).

2. Econometrics, 5ed.

Baltagi B.

Springer-Verlag Berlin, Germany (2011).

3. Growth regressions and what the textbooks don't tell you.

Temple J.

*Bull Econom Res*, **52**, 2000, 181-205.

4. Robust Regression and Outlier Detection.

Rousseeuw PJ & Leroy AM.

John Wiley & Sons, New York, NY (1987).

**Hadi's measure:**

A new measure of overall potential influence in linear regression.

Hadi AS.

*Comp Stat & Data Anal*, **14**, 1992, 1-27.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | ```
## Example #1 with single influencers and insignificant model (p = 0.115).
## Removal of #18 results in p = 0.0227!
set.seed(123)
a <- 1:20
b <- 5 + 0.08 * a + rnorm(20, 0, 1)
LM1 <- lm(b ~ a)
res1 <- lmInfl(LM1)
lmPlot(res1)
pvalPlot(res1)
inflPlot(res1)
slsePlot(res1)
stability(res1)
## Example #2 with multiple influencers and significant model (p = 0.0269).
## Removal of #2, #17, #18 or #20 result in crossing p = 0.05!
set.seed(125)
a <- 1:20
b <- 5 + 0.08 * a + rnorm(20, 0, 1)
LM2 <- lm(b ~ a)
res2 <- lmInfl(LM2)
lmPlot(res2)
pvalPlot(res2)
inflPlot(res2)
slsePlot(res2)
stability(res2)
## Large Example #3 with top 10 influencers and significant model (p = 6.72E-8).
## Not possible to achieve a crossing of alpha with any point despite strong noise.
set.seed(123)
a <- 1:100
b <- 5 + 0.08 * a + rnorm(100, 0, 5)
LM3 <- lm(b ~ a)
res3 <- lmInfl(LM3)
lmPlot(res3)
stability(res3)
## Example #4 with replicates and single influencer (p = 0.114).
## Removal of #58 results in p = 0.039.
set.seed(123)
a <- rep(1:20, each = 3)
b <- 5 + 0.08 * a + rnorm(20, 0, 2)
LM4 <- lm(b ~ a)
res4 <- lmInfl(LM4)
lmPlot(res4)
pvalPlot(res4)
inflPlot(res4)
slsePlot(res4)
stability(res4)
## As Example #1, but with weights.
## Removal of #18 results in p = 0.04747.
set.seed(123)
a <- 1:20
b <- 5 + 0.08 * a + rnorm(20, 0, 1)
LM5 <- lm(b ~ a, weights = 1:20)
res5 <- lmInfl(LM5)
lmPlot(res5)
stability(res5)
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.