knitr::opts_chunk$set(echo = TRUE)

This R package provides a pipeline to automatically build logistic regression models. Data pre-processing, variable selection, model building, diagnosis and deployments are all included.

Five functions related to logistic regression modeling are included in the package:

Examples:

Load the libraries.

library(dplyr)
library(tidyr)
library(rms)
library(stringr)
library(SimmonsResearchR)

Load the data.

data(modeldata)

#There are 5 dependent variables
DVList <- c("B_AUTOMOTIVE_GENS_MAKE_NETS_10459_Any_ChevroletGeo", 
            "B_AUTOMOTIVE_GENS_MAKE_NETS_10459_Any_Ford", 
            "B_AUTOMOTIVE_GENS_MAKE_NETS_10459_Any_Honda", 
            "B_AUTOMOTIVE_GENS_MAKE_NETS_10459_Any_Nissan", 
            "B_AUTOMOTIVE_GENS_MAKE_NETS_10459_Any_Toyota")

Pre-Processing: Several ways are included to pre-process the predictor data. It assumes that all of the data are numeric.

As the first step, We use demo variables only to build the logistic regression models. Here we want to keep all the demo variables in the model, all option is set to TRUE.

DemoVars <- c('Gender','respmar2','employ','incmid','parent','own','agemid','race1',
              'race2','race3','educat1','educat2','educat3')

model.demo <- lrm_model(data=modeldata, DVList=DVList, IDVList=DemoVars, all=TRUE)

and save the models information to a spreadsheet.

save_models_excel(model.demo, out="demo.xlsx")

We need to use DT variables as well, there are 265 DT Variables

DTVars <- names(modeldata %>% slice(1) %>%
                  select(starts_with("DT")))

length(DTVars)

model.dt <- lrm_model(data=modeldata, DVList=DVList, IDVList=DTVars)

There are 570 Psychographic variables. If we run them at one time, it may take too long, so we divide all 570 Psychographics into 3 groups and run them separately.

PsycoVars <- names(modeldata %>% slice(1) %>%
                     select(Apparel_5605_1:Views_7650_78))
length(PsycoVars)

# divide 570 Psycovars into 3 groups
PsycoVars1 <- PsycoVars[1:200]
PsycoVars2 <- PsycoVars[201:400]
PsycoVars3 <- PsycoVars[401:570]

model.Psyco1 <- lrm_model(data=modeldata, DVList=DVList,IDVList=PsycoVars1)
model.Psyco2 <- lrm_model(data=modeldata, DVList=DVList,IDVList=PsycoVars2)
model.Psyco3 <- lrm_model(data=modeldata, DVList=DVList,IDVList=PsycoVars3)

Next step is to combine all three Psycographic models together

model.Psyco12 <- combine_lrm_models(data=modeldata, model.Psyco1, model.Psyco2)
model.Psyco123 <- combine_lrm_models(data=modeldata, model.Psyco12, model.Psyco3)

In the same way, combine Psychographic and DT models together to get the final model. Because we need to include all the demo variables in the models, so we set Included = DemoVars.

model.final <- combine_lrm_models(data=modeldata, model.dt, model.Psyco123, Included=DemoVars)
save_models_excel(model.final, out="modelfinal.xlsx")

The last step is model deployment. We use the models to score the modeling base to calculate probabilities and new segments. Of course, the models can be applied to new data for scoring as well.

model.final.score <- score_model_data(model = model.final, newdata = modeldata, ID = "BOOK_ID", cutoff = 0.1, file = "modelscores.xlsx") 


yangx227/SimmonsResearchR documentation built on April 24, 2022, 6:44 a.m.