stepFWDr: Customized stepwise regression with p-value and trend check...

View source: R/35_STEP_FWDr.R

stepFWDrR Documentation

Customized stepwise regression with p-value and trend check on raw risk factors

Description

stepFWDr customized stepwise regression with p-value and trend check on raw risk factors. Trend check is performed comparing observed trend between target and analyzed risk factor and trend of the estimated coefficients within the binomial logistic regression. Difference between stepFWDr and stepFWD is that this function run stepwise regression on mixed risk factor types (numerical and/or categorical), while stepFWD accepts only categorical risk factors. Note that procedure checks the column names of supplied db data frame therefore some renaming (replacement of special characters) is possible to happen. For details check help example.

Usage

stepFWDr(
  start.model,
  p.value = 0.05,
  db,
  check.start.model = TRUE,
  offset.vals = NULL
)

Arguments

start.model

Formula class that represents starting model. It can include some risk factors, but it can be defined only with intercept (y ~ 1 where y is target variable).

p.value

Significance level of p-value of the estimated coefficients. For numerical risk factors this value is is directly compared to the p-value of the estimated coefficients, while for categorical risk factors multiple Wald test is employed and its p-value is used for comparison with selected threshold (p.value).

db

Modeling data with risk factors and target variable. Risk factors can be categorized or continuous.

check.start.model

Logical (TRUE or FALSE), if risk factors from the starting model should be checked for p-value and trend in stepwise process. Default is TRUE.

offset.vals

This can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be NULL or a numeric vector of length equal to the number of cases. Default is NULL.

Value

The command stepFWDr returns a list of four objects.
The first object (model), is the final model, an object of class inheriting from "glm".
The second object (steps), is the data frame with risk factors selected at each iteration.
The third object (warnings), is the data frame with warnings if any observed. The warnings refer to the following checks: if any categorical risk factor has more than 10 modalities or if any of the bins (groups) has less than 5% of observations.
The final, fourth, object dev.db returns the model development database.

Examples

suppressMessages(library(PDtoolkit))
data(loans)
trf <- c("Creditability", "Account Balance", "Duration of Credit (month)",
        "Age (years)", "Guarantors", "Concurrent Credits")
res <- stepFWDr(start.model = Creditability ~ 1, 
               p.value = 0.05, 
            db = loans[, trf],
               check.start.model = TRUE, 
               offset.vals = NULL)
summary(res$model)$coefficients
rf.check <- tapply(res$dev.db$Creditability, 
		 res$dev.db$Guarantors, 
		 mean)
rf.check
diff(rf.check)
res$steps
head(res$dev.db)

PDtoolkit documentation built on Sept. 20, 2023, 9:06 a.m.