premixed: Fitting a mixed-effects prediction rule ensemble

Description Usage Arguments Details Value Examples

Description

Experimental function for fitting mixed-effects prediction rule ensembles. Estimates a random intercept in addition to a prediction rule ensemble. This allows for analysing datasets with a clustered or multilevel structure, or longitudinal datasets. Experimental, so use at own risk.

Usage

1
2
3
premixed(formula, cluster = NULL, data, penalty.par.val = "lambda.min",
  learnrate = 0, use.grad = FALSE, conv.thresh = 0.001,
  family = "gaussian", ridge.ranef = FALSE, max.iter = 1000, ...)

Arguments

formula

a formula with three-part right-hand side, like y ~ 1 | cluster | x1 + x2 + x3; or with one-part right hand side, like y ~ x1 + x2 + x3. In the latter case, the cluster indicator must be specified through the cluster argument. #' @param data a dataframe containing the variables in the model

cluster

optional character string supplying the name of the cluster indicator. If specified, formula should not involve random effects (e.g., y ~ x1+ x2 + x3). If cluster is specified, random effects will not be estimated during tree induction. This will substantially speed up computations, but may yield a less accurate model, depending on the magnitude of the random effects.

data

dataframe containing the variables specified in formula.

penalty.par.val

as usual.

learnrate

as usual.

use.grad

as usual.

conv.thresh

numeric vector of length 1, specifies the convergence criterion for estimation of the model. If ridge.ranef = FALSE, it specifies the maximum difference in log-likelihoods of the random-effects model from two consecutive iterations for estimation to converge. If ridge.ranef = TRUE, it specifies the maximum absolute difference in random-effects predictions from two consecutive iterations for estimation to converge.

family

as usual. Note: should be a character vector!

ridge.ranef

logical vector of length 1. Should random effects be estimated through a ridge regression? If set to TRUE, random effects will be estimated through fitting a ridge regression model using function cv.glmnet. If set to FALSE, random effects will be estimated through fitting a mixed-effects regression model using function lmer or glmer.

max.iter

numeric vector of length 1. Maximum number of iterations performed to re-estimate fixed and random effects parameters.

...

further arguments to be passed to pre.

Details

Function premixed() allows for taking into account a random intercept in I) rule induction and/or II) coefficient estimation. To take into account the random intercept in both rule induction and coefficient estimation, see Example 1 below. To take into account the random intercept only in coefficient estimation, see Example 2 below. Alternatively, it has been suggested that random effects do not need to be taken into account explicitly but only through employing a blocked bootstrap or subampling approach, see Examples 3a and 3b below.

Note that approaches / examples 1 and 2 can be combined with the third approach / example 3. However, whether employing a cluster bootstrap- or subsampling approach is actually sufficient to take info account the clustered structure is a topic that still needs to be addressed.

Note that random intercept-only models are currently supported. That is, random slopes can currently not be specified.

Value

An object of class 'premixed'.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
## Example 1: Take into account clustered structure in rule induction
## as well as coeficient estimation: 
set.seed(42)
airq <- airquality[complete.cases(airquality),]
airq.ens1 <- premixed(Ozone ~ 1 | Month | Solar.R + Wind + Temp + Day, data = airq, ntrees = 10)
airq.ens1



## Example 2: Take into account clustered stucture in coefficient estimation
## only:
set.seed(42)
airq <- airquality[complete.cases(airquality),]
airq.ens2 <- premixed(Ozone ~ Solar.R + Wind + Temp + Day, cluster = "Month", data = airq, 
  ntrees = 10)
airq.ens2



## Example 3a: Take into account clustered structure in rule induction through 
## bootstrap- or subsampling:

## Create a sampling function that bootstrap samples whole clusters:
bb_sampfunc <- function(cluster = airq$Month) {
  result <- c()
  for(i in sample(unique(cluster), replace = TRUE)) {
    result <- c(result, which(cluster == i))
  }
  result
}
## Employ blocked bootstrap sampling function in fitting PRE:
library(pre)
set.seed(42)
airq.ens3a.bs <- pre(Ozone ~ Solar.R + Wind + Temp + Day, data = airq, sampfrac = bb_sampfunc)
airq.ens3a.bs

## Create a sampling function that subsamples ~75% of the clusters: 
ss_sampfunc <- function(cluster = airq$Month, sampfrac = .75) {
  result <- c()
  n_clusters <- round(length(unique(cluster)) * sampfrac)
  for(i in sample(unique(cluster), size = n_clusters, replace = FALSE)) {
    result <- c(result, which(cluster == i))
  }
  result
}
## Employ cluster subsampling in fitting PRE:
library(pre)
set.seed(42)
airq.ens3a.ss <- pre(Ozone ~ Solar.R + Wind + Temp + Day, data = airq, sampfrac = ss_sampfunc)
airq.ens3a.ss



## Example 3b: Take into account clustered structure in both rule induction and
## coefficient estimation:

## Generate fold ids:
airq <- airquality[complete.cases(airquality),]
foldids <- vector("numeric", length = nrow(airq))
counter <- 0
for (i in unique(airq$Month)) {
  counter <- counter + 1
  foldids[airq$Month == i] <- counter
}
foldids

## Employ clustered bootstrap sampling function for rule induction, as well as 
## cluster-specific fold ids for estimating coefficients:
set.seed(42)
airq.ens3b.ss <- pre(Ozone ~ Solar.R + Wind + Temp + Day, data = airq, sampfrac = ss_sampfunc, 
  foldid = foldids)
airq.ens3b.ss

marjoleinF/premixed documentation built on May 27, 2019, 4:50 a.m.