knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
In this vignette, two different examples on two different well-known datasets are proposed. The goal is to cover all different aspects of workflow based procedures included this package through these examples.
Basically, autoGAM has 4 functions that communicate with each other to create a workflow with the aim of evaluating different user-specified forms of included continuous predictors and find the best Generalized Additive Model (GAM). These functions are:
1- autoGAM_frame: The base of all other procedures! Evaluates user's specified form(s) on continuous predictor(s) and obtains the best form (among pre-specified form(s)) for each continuous predictor based on the chosen metric (AIC, AICc or BIC) values of GLM fits. This function creates the base platform for creating the final GAM adn the remaining functions are modifiers of this base platform.
Note: User can specify whether outliers should be ignored in fitting each form on predictor(s) or not. outliers in case of response are detected by car::outlierTest()
which is based on studentized residuals of records and outliers in case of the predictor are determined as records that have hat-values > 2p (where p is the number of parameters inside the model). If user decides to ignore outliers in evaluation process, ignore.outliers
argument can be set to TRUE
but when one is not sure about the presence of outliers, it's highly recommended to leave the default value which is FALSE
to obtain more accurate results.
2- backward_select: An independent function for performing p-value based backward elimination process that also modified to simply be used on returns of class autoGAM_frame
.
3- autoGAM_fit: A wrapper function that obtains and adds the final and best GAM to the platform.
4- plot: Plot method for returns of autoGAM_frame
to automatically create plots of the evaluated forms of continuous predictors and plots related to the best forms of predictors.
All the mentioned procedures can be used almost back and forth but user must remember that autoGAM_frame()
is the base that creates the base platform and other functions can be used on it in different orders.
library(autoGAM)
For the first example, we use mtcars dataset. We want to evaluate x, log(x) with 1 & 3 as its base, logb(x), power(x) with degrees 3 & 4, poly(x) (polynomial form) with degrees 3 & 4, ns(x) (natural cubic spline form) with 3 degrees of freedom and cut(x) with 3 & 4 partitions as our desired forms on disp, hp, drat & wt as our chosen continuous variables and then we want to obtain the best GAM made out of the obtained best forms of mentioned continuous variables and vs as our only desired categorical variable:
my_mtcars <- mtcars %>% mutate_at('vs',as.factor) (cars_frame <- autoGAM_frame(formula = mpg~disp+hp+drat+wt+vs, forms = list('identity'=c(),'log'=c(1,3),'logb', 'power'=3:4,'poly'=3:4,'ns'=3,'cut'=3:4), data = my_mtcars))
Best forms of continuous predictors and final predictors to be used in the final GAM are the default print outputs but they are part of a bigger list. Try exploring this list by calling different parts of it by cars_frame$
.
Note: As we can see, autoGAM automatically removes any function (form) that creates NaN
, Inf
or -Inf
on specific predictors and gives out a Warning message on which forms of predictor(s) were removed.
All the included functions in the forms list has the form: f(x) or f(x,degree) and they all output a vector or a matrix (As they should!). All of these functions are well known functions from base R or familiar packages (that are included as dependencies of autoGAM) including: identity, log2, logb, log, exp, poly, bs & ns (from splines package), s (from gam package) & cut. Only power function is an exception which is a pre-defined function inside autoGAM. power(x,degree) is simply x\^degree.
Note: You can always include your own function in the mentioned way if it has the f(x) or f(x,degree) form and it outputs a vector or a matrix. We will get back to this shortly in iris example.
'identity' (f(x)=x) and logb has the form: f(x) so we can only pass their name (like what we did for 'logb') or we can determine that they don't have degree argument by passing c() in front of their names (like 'identity'=c()), we call this singular determination. All other functions have the form: f(x,degree) and the desired degrees, degrees of freedom were passed in front of their names.
Note: If a hypothetical function g has the form g(x,degree) and you pass it singularly in the forms list ('g' or 'g'=c()), default value for degree in g(x,degree) will be used, otherwise if there is no default value for degree, you will get an error until you determine degree value(s) by passing them in front of 'g' name in the forms list.
Now that we have our base platform and the best forms of our included continuous predictors, it's time to fit the best GAM. it can't be any simpler! just call autoGAM_fit()
on the created frame:
(cars_fit <- autoGAM_fit(cars_frame))
Simple as that. Again this is the print output of autoGAM_fit()
. Actually the fitted model object was added to the base list created by autoGAM_frame()
. You can still explore everything by using cars_fit$
. For example let's extract the fitted model object that was added and we saw its summary:
cars_fit$`final fit`
As you saw, we first created the base platform and evaluated our desired forms of continuous predictors in it and then fitted the GAM by using the best forms of continuous predictors and our one categorical predictor. What if we wanted all the insignificant predictors in the last outpur gone? Again it can't be any easier. Just call backward_select()
function on the base platform then fit the final GAM by autoGAM_fit()
.
(cars_backward_fit <- cars_frame %>% backward_select(backward.alpha=.1,backward.test='Rao') %>% autoGAM_fit)
From the output we can see that backward elimination was done. The sequential details of this backward process were also added to the list from autoGAM_frame. let's check it out:
cars_backward_fit$`backward info`
Now nothing gathers up information in one place better than a good plot. plot method for autoGAM_frame
was designed to create plots related to evaluation of different forms of continuous predictors and to create final forms of included continuous predictors.
Note that plot can be called on any object with the autoGAM_frame
base.
plot(cars_frame,type='forms')
The plot above was a bit crowded, wasn't it? It's always good to see a formal plot for final best form of each continuous predictor that was included in the final GAM.
Based on full model:
plot(cars_fit,type='final')
Based on reduced model after backward elimination process:
plot(cars_backward_fit,type='final')
Now that we have the general idea about how autoGAM works, it's time to wrap everything up by mentioning final details in the form of an example.
This time we are interested in evaluation of two completely random functions that one of them has a vector output and the other one has a matrix output.
We first perform some modifications on iris data and then we introduce our desired functions:
(my_iris <- iris %>% filter(Species %in% unique(Species)[1:2]) %>% as_tibble %>% mutate_at('Species',as.character))
We filtered iris to only have the information of setosa & versicolor Species. So our version of iris (dat) has all numerical predictors and a categorical response with 2 levels (Species).
Now we introduce our forms (functions) to be evaluated on our predictors
temp_vec <- function(x) 3*sqrt(x)+5 temp_mat <- function(x,d) as.list((1:d)*-1) %>% map(~x^.) %>% reduce(cbind)
Both of these specified functions have f(x) or f(x,degree) form and their outputs are either a vector or a matrix so they can be used in the list of forms argument.
iris_fit <- autoGAM_frame(formula=Species~., data=my_iris, forms=list('temp_vec','temp_mat'=1:3), family=binomial(link='logit'), metric='AICc', parallel=T) %>% autoGAM_fit
plot(iris_fit,type='forms')
One last thing! You saw that base category of the Species variable was chosen to be setosa. This is because by default the first category of the response will become the base. There's no difference in how the model is going to predict but one might want to change this base for the sake of interpretation. For example what if we wanted versicolor as the base category of the response? We can easily set base category of response with resp.base argument.
iris_frame <- autoGAM_frame(formula=Species~., data=my_iris, forms=list('temp_vec','temp_mat'=1:3), family=binomial(link='logit'), metric='AICc', parallel=T, resp.base='versicolor') plot(iris_frame,type='forms') plot(iris_frame)
From update (1.0.0) that came with total rebuild of the package's structure, backward_select()
became an independent function so it can act as a function on autoGAM_frame
objects to match the new workflow based structure.
A valuable side effect of this independence is that this function can also act as an independent function for only backward selection procedures based on formula inputs.
In technical form, object
argument can be both a formula or an object of class autoGAM_frame
to participate in autoGAM's workflow.
The usage is pretty easy and straightforward but let's check out an example:
backward_select(object=mpg~hp+poly(disp,2)+cut(drat,3)+vs,data=my_mtcars)
The detailed information of the sequential backward elimination process will be returned directly. When object
is an object of class autoGAM_frame
instead of a formula, the above list will be added to the final frame list as we saw earlier.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.