Change unknown values to NA and vice versa
Unknown or missing values (
NA in R) can be represented in
various ways (as 0, 999, etc.) in different programs.
NAToUnknown can help to change unknown
NA and vice versa.
1 2 3
generic, object with unknown value(s)
generic, value used instead of
logical, issue warning if
logical, force to apply already existing value in
arguments pased to other methods (as.character for POSIXlt in case of isUnknown)
logical, look in
This functions were written to handle different variants of
NA” like representations that are usually used in
various external data sources.
unknownToNA can help to change
unknown values to
NA for work in R, while
meant for the opposite and would usually be used prior to export of data
isUnknown is utility function for testing for unknown
All functions are generic and the following classes were tested to work with latest version: “integer”, “numeric”, “character”, “factor”, “Date”, “POSIXct”, “POSIXlt”, “list”, “data.frame” and “matrix”. For others default method might work just fine.
isUnknown can cope with multiple values in
unknown, but those should be given as a “vector”. If not,
coercing to vector is applied. Argument
unknown can be feed also
with “list” in “list” and “data.frame” methods.
If named “list” or “vector” is passed to argument
x is also named, matching of names will occur.
Recycling occurs in all “list” and “data.frame” methods,
unknown argument is not of the same length as
unknown is not named.
NAToUnknown should hold value that is
not already present in
x. If it does, error is produced and one
can bypass that with
force=TRUE, but be warned that there is no
way to distinguish values after this action. Use at your own risk!
Anyway, warning is issued about new value in
caution should be taken when using
NAToUnknown on factors as
additional level (value of
unknown) is introduced. Then, as
unknownToNA removes defined level in
"NA" is removed from factor levels in
unknownToNA due to consistency with conversions back and forth.
Unknown representation in
unknown should have the same class as
NAToUnknown, except in factors, where
value is coerced to character anyway. Silent coercing is also applied,
when “integer” and “numeric” are in question. Otherwise
warning is issued and coercing is tried. If that fails, R introduces
NA and the goal of
NAToUnknown is not reached.
NAToUnknown accepts only single value in
x is atomic, while “list” and “data.frame” methods
accept also “vector” and “list”.
“list/data.frame” methods can work on many components/columns. To
reduce the number of needed specifications in
default unknown value can be specified with component ".default". This
matches component/column ".default" as well as all other undefined
components/columns! Look in examples.
NAToUnknown return modified
isUnknown returns logical values for object
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
xInt <- c(0, 1, 0, 5, 6, 7, 8, 9, NA) isUnknown(x=xInt, unknown=0) isUnknown(x=xInt, unknown=c(0, NA)) (xInt <- unknownToNA(x=xInt, unknown=0)) (xInt <- NAToUnknown(x=xInt, unknown=0)) xFac <- factor(c("0", 1, 2, 3, NA, "NA")) isUnknown(x=xFac, unknown=0) isUnknown(x=xFac, unknown=c(0, NA)) isUnknown(x=xFac, unknown=c(0, "NA")) isUnknown(x=xFac, unknown=c(0, "NA", NA)) (xFac <- unknownToNA(x=xFac, unknown="NA")) (xFac <- NAToUnknown(x=xFac, unknown="NA")) xList <- list(xFac=xFac, xInt=xInt) isUnknown(xList, unknown=c("NA", 0)) isUnknown(xList, unknown=list("NA", 0)) tmp <- c(0, "NA") names(tmp) <- c(".default", "xFac") isUnknown(xList, unknown=tmp) tmp <- list(.default=0, xFac="NA") isUnknown(xList, unknown=tmp) (xList <- unknownToNA(xList, unknown=tmp)) (xList <- NAToUnknown(xList, unknown=999))