inst/help/RegressionLogistic.md

Logistic Regression

Logistic regression allows the user to model a linear relationship between one or more explanatory variable(s) (predictors) and a categorical dependent (response) variable.

Assumptions

Input

Assignment box

Model

Statistics

Plots

Output

Logistic regression

Model summary: - Model: The different hypotheses that are compared. - Deviance: -2 x log-likelihood. - AIC (Akaike Information Criteria): Compare models using the Akaike Information Criterion. - BIC (Bayesian Information Criteria): Compare models using the Bayesian Information Criterion. - df: Degrees of freedom. - X2: chi-squared. - p: The p-value. - R squared value, i.e., the proportion of the total variance that is explained by the regression model. There are three pseudo R2 values calculated in JASP. - McFadden: calculated as one minus the ratio of the log-likelihood of the specified model to the log-likelihood of the null model. If the specified model fits the data relatively better than the null model, McFadden's R2 is close to 1. If the null model fits the data about the same as the specified model, McFadden's R2 is close to 0. - Cox & Snell: calculated as one minus the ratio of the likelihood of the null model to the likelihood of the specified model, with the ratio raised to the power of 2/n (sample size). Higher values indicate that the specified model fits the data relatively better than the null model. However, this index is bounded at one minus the likelihood of the null model raised to the power of 2/n, and under ideal circumstances can be only as high as 0.75. - Nagelkerke: provides a correction to the Cox & Snell R2 so that it is bounded at 1. Specifically, it is calculated as the Cox & Snell R2 divided by one minus the likelihood of the null model raised to the power of 2/n. Values closer to one indicate that the specified model outperforms the null model. - Tjur: calculated as the absolute value of the difference between the mean average predicted value for all cases with zero and the mean average predicted value for all cases with one. Values close to one indicate clear separation between the predicted values for cases with zeros and cases with ones. Unlike the other pseudo R2 indices, Tjur's R2 is not relative to the null model.

Coefficients: - Estimate: regression coefficients. - (Robust) Standard Error: Standard error of the regression coefficients. - Standardized: Standardized regression coefficients. - Odds Ratio: The most important values in the coefficients table are the odds ratios. For the continuous predictor, an odds ratio of greater than 1 suggests a positive relationship while < 1 implies a negative relationship. - z: The z-value. - Wald Test: The wald test is used to evaluate the statistical significance of each coefficient in the model. - Wald statistics: z^2. - df: Degrees of freedom. - p: The p-value. - VS-MPR: Vovk-Sellke maximum p-ratio. - 95% Confidence Interval (odds ratio scale) - [lower]%: Lower bound of the user-defined x% confidence intervals for the regression coefficients. - [upper]%: Upper bound of the user-defined x% confidence intervals for the regression coefficients.

Bootstrap Coefficients: - Estimate: bootstrapped regression coefficients. - Bias: Estimation of the bias. - Standard Error: Standard error of the bootstrapped regression coefficients.

Multicollinearity diagnostics: - Tolerance: Inverse of the Variance Inflation Factor (VIF). - VIF: Variance Inflation Factor; large values indicate multicollinearity. Calculated as VIF = det(R11) * det(R22) / det(R), where R is the covariance matrix of the regression coefficients (excluding intercept), R11 is a submatrix of R of the predictor for which VIF is calculated, and R22 is a submatrix of R of the other predictors (Fox & Monette, 1992; Fox, 2016).

Factor Descriptives: - The first column displays all levels of the factor. - N: The amount of observations per level of the factor.

Performance Diagnostics

Confusion Matrix: - The confusion matrix indicates how well the model predicts the outcomes. In the diagonal the cases that the model correctly identified are shown. The off-diagonal displays cases where the model predicted an incorrect outcome.

Performance metrics: - All selected performance metrics and their values are displayed in this table.

Estimates Plots

The conditional estimates plots display the probability of the dependent variable for all levels of the covariate given the reference of all other factors. If a (continues) covariate is added the grey shade around the line represents the 95% confidence intervals.

Diagnostic Plots

Predicted - residuals plot.

Predictor - residuals plot for predictor.

Squared Pearson residuals plot: - The expected value of the squared residuals is 1 displayed by the dotted gray line. The red line displays the smoother through the residuals (= moving average). If the red line lies mostly near 1, it can be concluded that the model does not suffer much from overdispersion. Some deviation around the tails is to be expected.

Independent - predicted plot: - Plots the model predictions against each independent variable. - Include interactions: also add every two-way interaction (binning continuous variables as necessary) - Use logit scale: plot predicted probabilities on the logit scale, to ensure a linear relation

References

R Packages



jasp-stats/Regression documentation built on July 15, 2024, 7:04 a.m.