knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(FinalProject)

mlr() Function

Regression Modeling

This function is intended to be used with the finalproj() function that offers the you a choice of what kind of program you want to initiate. It can, though, be used on its own to aid in regression modeling and analysis.

Initialize the Function

The prompt/menu is initialized by typing mlr() in the console (or through the use of finalproj()). From there, you are asked for input of a dataset. If you are unsure of the datasets available, try typing data() into the console, and it will provide a list of all currently available datasets in the environment. Once your dataset is selected, you will be asked to begin building your model.

# mlr()
# Name of your dataset:

Set-Up Your Model

In this part of the function, you are asked for inputs of response variable, number of predictor variables, the predictor variable(s), and inclusion of an interaction term. You can only have one response variable and one interaction term, but you can have as many predictor variables as are in the dataset.

The benefit of this selection system is that you don't have to remember nor type in the variable names for the model. The function is designed to bring up all the available selections, which are then chosen by the number preceeding the variable.

### Input lines are Indented

##### Response Variable (X):  
# 
#  1: mpg    2: cyl    3: disp   4: hp     5: drat   6: wt     7: qsec   8: vs     9: am    10: gear  11: carb
# 
# 
# Selection: 1
##### How many predictor variables?: 2
##### Which Predictor Variables to Include:  
# 
#  1: mpg    2: cyl    3: disp   4: hp     5: drat   6: wt     7: qsec   8: vs     9: am    10: gear  11: carb
# 
# 
# Selection: 6
##### Which Predictor Variables to Include:  
# 
#  1: mpg    2: cyl    3: disp   4: hp     5: drat   6: wt     7: qsec   8: vs     9: am    10: gear  11: carb
# 
# 
# Selection: 4
##### Would you like an interaction term? 
# 
# 1: Y
# 2: N
# 
# Selection: 1
##### Which Predictor Variables to Include:  
# 
#  1: mpg    2: cyl    3: disp   4: hp     5: drat   6: wt     7: qsec   8: vs     9: am    10: gear  11: carb
# 
# 
# Selection: 6
##### Which Predictor Variables to Include:  
# 
#  1: mpg    2: cyl    3: disp   4: hp     5: drat   6: wt     7: qsec   8: vs     9: am    10: gear  11: carb

*These variables were produced from the built-in dataset mtcars

Choose Your Model (with regr())

This step is fairly simple - now that your model has been chosen, you can choose which type of model you would like to use in your analysis. Options 2-3 specify they are for count data, and options 4-5 specify that they are for binary data. You may certainly choose any of these, but be sure you know which model is appropriate for your data.

##### Which Regression to Perform:  
# 
# 1: Linear Regression
# 2: Poisson (Count Outcome)
# 3: Negative Binomial (Count Outcome)
# 4: Logistic Model (Binary Outcome)
# 5: Probit Regression (Binary Outcome)

Output Options

In the final series of prompts, you are asked what kind of output you desire. You are asked whether you would like a summary (which contains coefficients and p-values (frequently requested items)) and whether you would like a plot. If your model has only one covariate (predictor variable), the plot will be of $y~x$ with a regression line. If your model has more than one covariate (predictor variable), then four residual plots will be produced for assessing model fit.

##### Would you like a summary? 
# 
# 1: Y
# 2: N
# 
# Selection: 1
##### Would you like a plot(s)? 
# 
# 1: Y
# 2: N

Examples

For this set of examples, we will use the built-in dataset mtcars.

Output for: Linear regression, interaction term, two predictor variables: mpg~wt+hp+wt*hp

# [[1]]
# 
# Call:
# lm(formula = as.formula(frmla), data = dtst)
# 
# Residuals:
#     Min      1Q  Median      3Q     Max 
# -3.0632 -1.6491 -0.7362  1.4211  4.5513 
# 
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)    
# (Intercept) 49.80842    3.60516  13.816 5.01e-14 ***
# wt          -8.21662    1.26971  -6.471 5.20e-07 ***
# hp          -0.12010    0.02470  -4.863 4.04e-05 ***
# wt:hp        0.02785    0.00742   3.753 0.000811 ***
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# 
# Residual standard error: 2.153 on 28 degrees of freedom
# Multiple R-squared:  0.8848,  Adjusted R-squared:  0.8724 
# F-statistic: 71.66 on 3 and 28 DF,  p-value: 2.981e-13

Regression Diagnostics for multiple linear regression model with interaction

Output for: Linear regression, one predictor variable: mpg~wt

# [[1]]
# 
# Call:
# lm(formula = as.formula(frmla), data = dtst)
# 
# Residuals:
#     Min      1Q  Median      3Q     Max 
# -4.5432 -2.3647 -0.1252  1.4096  6.8727 
# 
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)    
# (Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
# wt           -5.3445     0.5591  -9.559 1.29e-10 ***
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# 
# Residual standard error: 3.046 on 30 degrees of freedom
# Multiple R-squared:  0.7528,  Adjusted R-squared:  0.7446 
# F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10

Plot with regression line for simple linear regression model

Output for: Poisson regression, three predictor variables: gear~wt+hp+vs

# [[1]]
# 
# Call:
# glm(formula = as.formula(frmla), family = poisson(), data = dtst)
# 
# Deviance Residuals: 
#      Min        1Q    Median        3Q       Max  
# -0.50901  -0.22929  -0.00885   0.21858   0.45792  
# 
# Coefficients:
#              Estimate Std. Error z value Pr(>|z|)    
# (Intercept)  1.656358   0.488483   3.391 0.000697 ***
# wt          -0.183343   0.132186  -1.387 0.165439    
# hp           0.001466   0.002094   0.700 0.483887    
# vs           0.032149   0.269704   0.119 0.905115    
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# 
# (Dispersion parameter for poisson family taken to be 1)
# 
#     Null deviance: 4.4634  on 31  degrees of freedom
# Residual deviance: 2.3048  on 28  degrees of freedom
# AIC: 111.77
# 
# Number of Fisher Scoring iterations: 4

Regression Diagnostics for Poisson regression model & three predictor variables

Output for: Logistic regression, two predictor variables: vs~wt+hp

# [[1]]
# 
# Call:
# glm(formula = as.formula(frmla), family = binomial(), data = dtst)
# 
# Deviance Residuals: 
#      Min        1Q    Median        3Q       Max  
# -1.97244  -0.17796  -0.00677   0.43399   1.49246  
# 
# Coefficients:
#             Estimate Std. Error z value Pr(>|z|)  
# (Intercept)  7.41037    3.42975   2.161   0.0307 *
# wt           1.00334    1.23161   0.815   0.4153  
# hp          -0.08535    0.03602  -2.369   0.0178 *
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# 
# (Dispersion parameter for binomial family taken to be 1)
# 
#     Null deviance: 43.860  on 31  degrees of freedom
# Residual deviance: 16.109  on 29  degrees of freedom
# AIC: 22.109
# 
# Number of Fisher Scoring iterations: 7

Regression Diagnostics for logistic regression model & two predictor variables



labarreb/FinalProject documentation built on May 3, 2019, 1:34 p.m.