# compute: Compute the missing values to later impute them in another... In imputeMissings: Impute Missing Values in a Predictive Context

## Description

When the median/mode method is used: character vectors and factors are imputed with the mode. Numeric and integer vectors are imputed with the median. When the random forest method is used predictors are first imputed with the median/mode and each variable is then predicted and imputed with that value. For predictive contexts there is a `compute` and an `impute` function. The former is used on a training set to learn the values (or random forest models) to impute (used to predict). The latter is used on both the training and new data to impute the values (or deploy the models) learned by the `compute` function.

## Usage

 `1` ```compute(data, method = "median/mode", ...) ```

## Arguments

 `data` A data frame with dummies or numeric variables. When method=="median/mode" columns can be of type "character". When method="randomForest" columns cannot be of type "character". `method` Either "median/mode" or "randomForest" `...` additional arguments for `randomForest`

## Value

Values or models used for imputation

## Author(s)

Matthijs Meire, Michel Ballings, Dirk Van den Poel, Maintainer: Matthijs.Meire@UGent.be

`impute`

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35``` ```#Compute the values on a training dataset and impute them on new data. #This is very convenient in predictive contexts. For example: #define training data (train <- data.frame(v_int=as.integer(c(3,3,2,5,1,2,4,6)), v_num=as.numeric(c(4.1,NA,12.2,11,3.4,1.6,3.3,5.5)), v_fact=as.factor(c('one','two',NA,'two','two','one','two','two')), stringsAsFactors = FALSE)) #Compute values on train data #randomForest method values <- compute(train, method="randomForest") #median/mode method values2 <- compute(train) #define new data (newdata <- data.frame(v_int=as.integer(c(1,1,2,NA)), v_num=as.numeric(c(1.1,NA,2.2,NA)), v_fact=as.factor(c('one','one','one',NA)), stringsAsFactors = FALSE)) #locate the NA's is.na(newdata) #how many missings per variable? colSums(is.na(newdata)) #Impute on newdata impute(newdata,object=values) #using randomForest values impute(newdata,object=values2) #using median/mode values #One can also impute directly in newdata without the compute step impute(newdata) #Flag parameter impute(newdata,flag=TRUE) ```

### Example output

```  v_int v_num v_fact
1     3   4.1    one
2     3    NA    two
3     2  12.2   <NA>
4     5  11.0    two
5     1   3.4    two
6     2   1.6    one
7     4   3.3    two
8     6   5.5    two
v_int v_num v_fact
1     1   1.1    one
2     1    NA    one
3     2   2.2    one
4    NA    NA   <NA>
v_int v_num v_fact
[1,] FALSE FALSE  FALSE
[2,] FALSE  TRUE  FALSE
[3,] FALSE FALSE  FALSE
[4,]  TRUE  TRUE   TRUE
v_int  v_num v_fact
1      2      1
v_int    v_num v_fact
1 1.000000 1.100000    one
2 1.000000 4.117308    one
3 2.000000 2.200000    one
4 2.542738 4.117308    one
v_int v_num v_fact
1     1   1.1    one
2     1   4.1    one
3     2   2.2    one
4     3   4.1    two
v_int v_num v_fact
1     1  1.10    one
2     1  1.65    one
3     2  2.20    one
4     1  1.65    one
v_int v_num v_fact v_int_flag v_num_flag v_fact_flag
1     1  1.10    one          0          0           0
2     1  1.65    one          0          1           0
3     2  2.20    one          0          0           0
4     1  1.65    one          1          1           1
```

imputeMissings documentation built on May 2, 2019, 5:14 a.m.