logregr | R Documentation |
The function allows to make it easy to perform binary Logistic Regression, and to graphically
display the estimated coefficients and odds ratios. It also allows to visually check model's
diagnostics such as outliers, leverage, and Cook's distance.
logregr(data, oneplot = FALSE)
data |
Dataframe containing the dataset (Dependent Variable listed in the first column to the left). |
oneplot |
Logical value which takes TRUE or FALSE (default) if the user does or doesn't want to group the first set of 8 charts in one panel. |
The function may take a while (just matter of few seconds) to completed all the
operations, and will eventually return the following charts:
(1) Estimated coefficients, along with each
coefficient's confidence interval; a reference line is set to 0. Each bar is given a color
according to the associated p-value, and the key to the color scale is reported in the chart's
legend.
(2) Odds ratios and their confidence intervals.
(3) A chart that is helpful in visually gauging the discriminatory power of the model: the
predicted probability (x axis) are plotted against the dependent variable (y axis). If the model
proves to have a high discriminatory power, the two stripes of points will tend to be well
separated, i.e. the positive outcome of the dependent variable (points with color corresponding
to 1) would tend to cluster around high values of the predicted probability, while the opposite
will hold true for the negative outcome of the dependent variable (points with color
corresponding to 0). In this case, the AUC (which is reported at the bottom of the chart) points
to a low discriminatory power.
(4) Model's standardized (Pearson's) residuals against the predicted probability; the size of the
points is proportional to the Cook's distance, and problematic points are flagged by a label
reporting their observation number if the following two conditions happen: residual value larger
than 3 (in terms of absolute value) AND Cook's distance larger than 1. Recall that an observation
is an outlier if it has a response value that is very different from the predicted value based on
the model. But, being an outlier doesn't automatically imply that that observation has a negative
effect on the model; for this reason, it is good to also check for the Cook's distance, which
quantifies how influential is an observation on the model's estimates. Cook's distance should not
be larger than 1.
(5) Predicted probability plotted against the leverage value; dots represent observations, and
their size is proportional to their leverage value, and their color is coded according to whether
or not the leverage is above (lever. not ok) or below (lever. ok) the critical threshold. The
latter is represented by a grey reference line, and is also reported at the bottom of the chart
itself. An observation has high leverage if it has a particularly unusual combination of
predictor values. Observations with high leverage are flagged with their observation number,
making it easy to spot them within the dataset. Remember that values with high leverage and/or
with high residual may be potential influencial points and may potentially negatively impact the
regression. As for the leverage threshold, it is set at 3*(k+1)/N (following Pituch-Stevens,
Applied Multivariate Statistics for the Social Science. Analyses with SAS and IBM's SPSS,
Routledge: New York 2016), where k is the number of predictors and N is the sample size.
(6) Predicted probability against the Cook's distance.
(7) Standardized (Pearson's) residuals against the leverage; points representing observations
with positive or negative outcome of the dependent variable are given different colors. Further,
points' size is proportional to the Cook's distance. Leverage threshold is indicated by a grey
reference line, and the threshold value is also reported at the bottom of the chart. Observations
are flagged with their observation number if their residual is larger than 3 (in terms of
absolute value) OR if leverage is larger than the critical threshold OR if Cook's distance is
larger than 1. This allows to easily check which observation turns out to be an outlier or a
high-leverage data point or an influential point, or a combination of the three.
(8) Chart that is almost the same as (7) except for the way in which observations are flagged. In
fact, they are flagged if the residual is larger than 3 (again, in terms of absolute value) OR if
the leverage is higher than the critical threshold AND if a Cook's distance larger than 1 plainly
declares them as having a high influence on the model's estimates. Since an observation may be
either an outlier or a high-leverage data point, or both, and yet not being influential, the
chart allows to spot observations that have an undue influence on our model, regardless of them
being either outliers or high-leverage data points, or both.
(9) Observation numbers are plotted against the standardized (Pearson's) residuals, the leverage,
and the Cook's distance. Points are labelled according to the rationales explained in the
preceding points. By the way, the rationale is also explained at the bottom of each plots.
The function also returns a list storing two components: one is named 'formula' and stores the formula used for the logistic regression; the other contains the model's results.
modelvalid
, aucadj
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.