This document tracks opinionated desicion about the na.tools and tidyimpute packages that largely have to do with the design choices made
The packages shoould handle be the single repository for functions/methods for working with missing values (NA) for all data science workflows.
It should be extensible and be able to handle
forcats::fct_explicit_na
is (Missing)
; this is not
adopted here because it is needlessly long, '(NA)' is used instead.na.*
works only on atomic vectors; na_*
functions are for higher-level functions Function names should follow the lower_snake_case naming conventions. This prevents
collisions with functions from the stats package. It may make sense for
na.*
functions operate at a low-level on vectors and similar to the stats
package while na_*
vectors operate on a higher level.
Follow tidyverse styles
NA_explicit_
NAs
when printing to the console. na.replace( .x, .na=mean )
na_replace( .tbl, col1 = mean ) Or,
na_replace( .tbl, col1 = mean(col1) )
Imputation should be preformed when the replacement value is a rhs-formula:
na_replce(tbl, col1 = ~col2, .method=lm )
This has the effect of creating a model for col1 ~ col2
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.