h2o.impute | R Documentation |
Perform inplace imputation by filling missing values with aggregates computed on the "na.rm'd" vector. Additionally, it's possible to perform imputation based on groupings of columns from within data; these columns can be passed by index or name to the by parameter. If a factor column is supplied, then the method must be "mode".
h2o.impute(
data,
column = 0,
method = c("mean", "median", "mode"),
combine_method = c("interpolate", "average", "lo", "hi"),
by = NULL,
groupByFrame = NULL,
values = NULL
)
data |
The dataset containing the column to impute. |
column |
A specific column to impute, default of 0 means impute the whole frame. |
method |
"mean" replaces NAs with the column mean; "median" replaces NAs with the column median; "mode" replaces with the most common factor (for factor columns only); |
combine_method |
If method is "median", then choose how to combine quantiles on even sample sizes. This parameter is ignored in all other cases. |
by |
group by columns |
groupByFrame |
Impute the column col with this pre-computed grouped frame. |
values |
A vector of impute values (one per column). NaN indicates to skip the column |
The default method is selected based on the type of the column to impute. If the column is numeric then "mean" is selected; if it is categorical, then "mode" is selected. Other column types (e.g. String, Time, UUID) are not supported.
an H2OFrame with imputed values
## Not run:
h2o.init()
iris_hf <- as.h2o(iris)
iris_hf[sample(nrow(iris_hf), 40), 5] <- NA # randomly replace 50 values with NA
# impute with a group by
iris_hf <- h2o.impute(iris_hf, "Species", "mode", by = c("Sepal.Length", "Sepal.Width"))
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.