na.tools is a comprehensive library for handling missing (NA) values. It has several goals:
stats::na.*()
functions, In this package, there are methods for the detection, removal, replacement,
--imputation--, recollection, etc. of missing values (NAs
). This libraries focus
is on vectors (atomics). For tidy/dplyr compliant methods operating on
tables and lists, please use the
tidyimpute package which
depends on this package.
devtools::install_github( "decisionpatterns/na.tools")
install.packages("na.tools")
na.*
functions found in the stats package.n_na
, pct_na
na.rm
na.
return a transformed version of the
input vector with missing values imputes/x <- 1:3
x[2] <- NA_real_
any_na(x)
all_na(x)
which_na(x)
n_na(x)
pct_na(x)
na.rm(x)
na.replace(x, 2)
na.replace(x, mean) # error
na.replace(x, na.mean) # Works
na.zero(x)
na.mean(x)
na.cumsum(x)
na.n
- Count mising values na.pct
- Calculate pct of missing valueswhich.na
- Return logical or character indicating which elements are missing all.na
(na.all
) - test if all elements are missingany.na
(na.any
) - test if any elements are missingna.rm
- remove NA
s (with tables is equivalent to drop_cols_all_na
)na.trim
- remove NA
s from beginning or end (non-commutative/order matters)There are two types of imputation methods for plain vectors. They are distinguished by their replacement values.
In "constant" imputation methods, missing values are replaced by an a priori selected constant value. No calculation are performed to derive replacement values and all missing value assume the same transformied value.
na.zero
: Replace NA
s with 0na.true
| na.false
: ... TRUE
na.inf
/ na.neginf
: ... Inf
/ -Inf
na.constant
: constant value .na
In functional imputation, the value is calculated from the vector containing the missing value(s) -- and only that vector. Missing values may impute to different values. Replacement values may (or may not) be affected by the ording of the vector.
Cummatative functions
Commutative functions provide the same result irregarless of the ordering of the input vectors. Therefore, these functions do not depend on the ordering of elements of the input vector.
(When imputing in a table, imputation by function is also called column-based imputation since replacement values derive from the single column. Table-based imputation is found in the tidyimpute package.)
na.max
- maximum na.min
- minumum na.mean
- mean na.median
- median valuena.quantile
- quantile valuena.sample
/na.random
- randomly sampled valueNon-commulative functions s
na.cummax
- cumulative maxna.cummin
- cumulative minna.cumsum
- cumulative sumna.cumprod
- cumulative prodGeneral Imputation
na.replace
/na.explicit
- atomic vectors only. General replacement functionna.unreplace
/na.implicit
- turn explicit values back into NAsA number of other packages have methods for working with missing values and/or imputation. Here is a short, incomplete and growing list:
randomForest::na.roughfix()
- imputes with median
zoo::na.*
- collection of non-commutative imputation techniques for time series data.mitools provides tools for multiple imputation, mice provides multivariate imputation by chained equations mvnmle provides ML estimation for multivariate normal data with missing values, mix provides multiple imputation for mixed categorical and continuous data. pan provides multiple imputation for missing panel data. VIM provides methods for the visualisation as well as imputation of missing data. aregImpute() and transcan() from Hmisc provide further imputation methods. monomvn deals with estimation models where the missing data pattern is monotone.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.