f_train_lasso: wrapper for cv.glmnet and cv.HDtweedie

Description Usage Arguments Details Value See Also Examples

View source: R/f_train.R

Description

performs lasso for different distributions, returns a list of formulas that result in the lowest mse for at least one of the distributions. Graphical output allows side-by-side comparison of lasso behaviour for all distributions.

Usage

1
2
f_train_lasso(data, formula, p = c(1, 1.25, 1.5, 1.75, 2), k = 5,
  family = "gaussian", ...)

Arguments

data

dataframe

formula

formula

p

p parameter for tweedie distributions, set p = NULL for not performing lasso for tweedie distributions, Default: c(1, 1.25, 1.5, 1.75, 2)

k

fold cross validation, Default: 5

family

family parameter for glmnet, can be a vector, Default: 'gaussian'. For classification use 'binomial'. Performance metric MSE will be replaced with AUC.

...

arguments passed to cv.glmnet, cv.HDtweedie such as lambda or n_lambda

Details

Columns containing NA will be removed, formula cannot be constructed with '.', use family = 'binomial for classification'.

!!! Watchout the Data will not be scaled automatically.

Value

list()

See Also

,HDtweedie ,glmnet ,cv.HDtweedie ,cv.glmnet ,pipelearner ,learn_models ,learn_cvpairs ,learn

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
#regular regression

data = MASS::quine
formula = Days ~ Eth + Sex + Age + Lrn

# here we scale, center and create sensibly named dummy variables
trans_ls = f_manip_data_2_model_matrix_format( data, formula )


lasso = f_train_lasso(trans_ls$data, trans_ls$formula, p = NULL, k = 3
                     , lambda = 10^seq(3,-3,length= 25) )
lasso = f_train_lasso(trans_ls$data, trans_ls$formula, p = 1.5, k = 3
                     , lambda = 10^seq(3,-3,length= 25) )

lasso


#classification

# here we transform double to factor
data_ls = mtcars %>%
  f_clean_data()

formula = vs ~ cyl + mpg + disp + hp + drat + wt + qsec + am + gear + carb

# here we scale, center and create sensibly named dummy variables
trans_ls = f_manip_data_2_model_matrix_format( data_ls$data, formula )

lasso = f_train_lasso( trans_ls$data
                      , trans_ls$formula
                      , p = NULL
                      , family = 'binomial'
                      , k = 3
 )

lasso

erblast/oetteR documentation built on Jan. 3, 2019, 11:19 a.m.