woereg: regressions and classifications for data frame in credit...

Description Usage Arguments Details Value Examples

Description

apply logistical regression and trees to credit related data frame, to calcualte the credit score

Usage

1
woereg(Data, keeplist = NULL, droplist = NULL, y, regtype = "logit", p_value = NULL, n = NULL)

Arguments

Data

data frame with at least two columns

keeplist

Name of the Independent Variables keept for capping, if missing then for all Independent Variables

droplist

Name of the Independent Variables keept for capping, if missing then for all Independent Variables

y

Name of the dependent Variables

regtype

there are mutilpe regresiion types: logistical regression, with regtype='logit', currently available, which is default linear regression, with regtype='logit', currently under building multinomial logistic regression, with regtype='multi', currently under building ordinal logistic regression, with regtype='ordinal', currently under building

p_value

Name of the dependent Variables the p-value cutoff, would be value between 0-1, for example 0.01, 0.05, the default value is 0.01

n

number of iterations the number of logistical iterations would run, default 3 (later will based on opt p_value)

Details

Note: I used glm.R to train mulptple iterations, since every time we use GLM in logistic regression to build credit models, there would be multiple training iterations and it's very hard to do variable keeping/droping each time. Later, I will have more regression types added into this interface, by using param regtype

Value

test:(currectly only works for numarical variables, since the charactor variable names changed in GLM coeficient) data is from the Titanic project https://www.kaggle.com/c/titanic/data traindata <- read.csv('train.csv',header=T,na.strings=c("")) Data <- subset(traindata,select=c(2,3,5,6,7,8,10,12)) keeplist= c('Fare','Pclass') droplist= c('Sex','Embarked') woereg(Data=Data,keeplist=c('Fare','Pclass'),droplist=c('Sex','Embarked'),y='Survived', regtype='logit', p_value= 0.01, n=3)

results: Estimate Std. Error z value Pr(>|z|) (Intercept) 3.188881 0.488922 6.522 6.93e-11 *** Pclass -1.130074 0.142036 -7.956 1.77e-15 *** Age -0.040656 0.006782 -5.994 2.04e-09 *** Fare 0.003213 0.002326 1.381 0.167

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
##---- Should be DIRECTLY executable !! ----
##-- ==>  Define data, use random,
##--	or do  help(data=index)  for the standard data sets.

## The function is currently defined as
function (Data, keeplist = NULL, droplist = NULL, y, regtype = "logit",
    p_value = NULL, n = NULL)
{
    if (!is.null(droplist)) {
        Data <- Data[, !(names(Data) %in% droplist)]
    }
    nums <- sapply(Data, is.numeric)
    Data <- Data[, nums]
    if (is.null(p_value)) {
        p_value <- 0.01
    }
    if (is.null(n)) {
        n <- 3
    }
    for (i in 1:n) {
        f <- paste(as.name(y), "~.")
        model <- glm(f, family = binomial(link = "logit"), data = Data)
        pvalue <- round(coef(summary(model))[, 4], 6)
        varsel <- pvalue[which(pvalue <= p_value)]
        drops <- c(droplist, "(Intercept)")
        vallist <- varsel[!(names(varsel) %in% drops)]
        varselname <- unique(c(names(vallist), keeplist))
        if (!is.null(varselname)) {
            Data <- subset(Data, select = c(y, varselname))
        }
    }
    summary(model)
    return(model)
  }

billyuanyao/WOECredit documentation built on May 28, 2019, 7:11 p.m.