stack_impute: Imputation by stacking complete and incomplete data

Description Usage Arguments Value Examples

View source: R/stack_impute.R

Description

impute_stack

Usage

1
stack_impute(dataset, newdata, method = "missforest", seed = 1L, ...)

Arguments

dataset

(dataframe) dataset

newdata

(dataframe) newdata

method

(string )One among: 'missforest', 'proximity'

seed

(positive integer) seed

...

Arguments to be passed to missRanger when method is 'missforest', forest_impute when method is 'proximity'

Value

(dataframe) completed dataset

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
## Not run: 
# divide isis data into test and train
set.seed(1)
index <- sample.int(150, 100)

iris_train <- iris[index, ]
iris_test  <- iris[-index, ]

# create some holes in test data
iris_test_missing <- missRanger::generateNA(iris_test, p = 0.2, seed = 2)

# stack imputation

# use missforest method
imputed_mf <- stack_impute(iris_train
                           , iris_test_missing
                           , method = "missforest"
                           , seed = 3
                           )

# metric: rmse for numeric, proportion of mismatches for categorical
metric_relative <- function(x, y, z){

  if(sum(z) == 0){
    return(0)
  }

  if(is.numeric(x)){
    mean(abs((y[z] - x[z])/y[z]))
  } else {
    sum(x[z] != y[z])/sum(z)
  }

}

# compare
mapply(metric_relative
       , iris_test
       , imputed_mf
       , as.data.frame(is.na(iris_test_missing))
       )

# use proximity method
imputed_pr <- stack_impute(iris_train
                           , iris_test_missing
                           , method = "proximity"
                           , seed = 3
                           )

# compare
mapply(metric_relative
       , iris_test
       , imputed_pr
       , as.data.frame(is.na(iris_test_missing))
       )

## End(Not run)

talegari/forager documentation built on May 3, 2019, 4:01 p.m.