Influence plots | R Documentation |
Seven different plot types that visualize p-value influencers.
1. lmPlot
: plots the linear regression, marks the influencer(s) in red and displays trend lines for the full and leave-one-out (LOO) data set (black and red, respectively).
2. pvalPlot
: plots the p-values for each LOO data point and displays the values as a full model/LOO model plot, together with the alpha
border as defined in lmInfl
.
3. inflPlot
: plots dfbeta
for slope, dffits
, covratio
, cooks.distance
, leverage (hatvalues
), studentized residuals (rstudent
) and Hadi's measure against the Δp-value. Herewith, changes in these six parameters can be compared to the effect on the corresponding drop/rise in p-value. The plots include vertical boundaries for threshold values as defined in the literature under 'References'.
4. slsePlot
: plots all LOO-slopes and their standard errors together with the corresponding original model values and a t-value border as calculated by \mathit{Q_t}(1 - \frac{α}{2}, n-2). LOO of points on the right of this border result in a significant model, and vice versa.
5. threshPlot
: plots the output of lmThresh
, i.e. the regression plot including confidence/prediction intervals, as well as for each response value y_i the region in which the model is significant (green). This is tested for either i) y_i that are shifted into this region (newobs = FALSE
in lmThresh
) or ii) when a new observation y2_i is added (newobs = TRUE
in lmThresh
). In the latter case, it is informative if this region resides within the prediction interval (dashed line), indicating that a future additional measurement at x_i might reverse the significance statement.
6. stabPlot
: for single (to be selected) response values from the output of lmThresh
, this function displays the region of significance reversal within the surrounding prediction interval. The probability of a either shifting the response value (if lmThresh(..., newobs = FALSE)
) or of including a future (measurement) point (if lmThresh(..., newobs = TRUE)
) to reverse the significance is shown as the integral between the "end of significance region" (eosr) and the nearest prediction interval boundary.
NOTE: The visual display should always be supplemented with the corresponding stability
analysis.
lmPlot(infl, ...) pvalPlot(infl, ...) inflPlot(infl, ...) slsePlot(infl, ...) threshPlot(thresh, bands = FALSE, ...) stabPlot(stab, which = NULL, ...)
infl |
an object obtained from |
thresh |
an object obtained from |
stab |
an object obtained from using |
bands |
logical. If |
which |
which response value should be shown in |
... |
other plotting parameters. |
The corresponding plot.
Cut-off values for the different influence measures are those defined in Belsley, Kuh E & Welsch (1980):
dfbeta slope: | Δβ1_i | > 2/√{n} (page 28)
dffits: | \mathrm{dffits}_i | > 2√{2/n} (page 28)
covratio: |\mathrm{covr}_i - 1| > 3k/n (page 23)
Cook's D: D_i > Q_F(0.5, k, n - k) (Cook & Weisberg, 1982)
leverage: h_{ii} > 2k/n (page 17)
studentized residual: t_i > Q_t(0.975, n - k - 1) (page 20)
Hadi's measure: H_i^2 > \mathrm{Med}(H_i^2) + 2 \cdot \mathrm{MAD}(H_i^2) (Hadi 1992)
Andrej-Nikolai Spiess
Regression diagnostics: Identifying influential data and sources of collinearity.
Belsley DA, Kuh E, Welsch RE.
John Wiley, New York (2004).
Applied Regression Analysis: A Research Tool.
Rawlings JO, Pantula SG, Dickey DA.
Springer; 2nd Corrected ed. 1998. Corr. 2nd printing 2001.
Applied Regression Analysis and Generalized Linear Models.
Fox J.
SAGE Publishing, 3rd ed, 2016.
Residuals and Influence in Regression.
Cook RD & Weisberg S.
Chapman & Hall, 1st ed, New York, USA (1982).
A new measure of overall potential influence in linear regression.
Hadi AS.
Comp Stat & Data Anal, 14, 1992, 1-27.
## See Examples in 'lmInfl' and 'lmThresh'.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.