faux-naïf (/ˌfoʊ.naɪˈif/): a person who pretends to be simple or innocent
fauxnaif: an R package for simplifying data by pretending values are
NA
fauxnaif provides an extension to dplyr::na_if()
. Unlike
dplyr’s na_if()
,
na_if_in()
allows you to specify multiple values to be replaced with
NA
using a single function. fauxnaif also includes a complementary
function na_if_not()
to specify values to keep.
You can install fauxnaif
from
CRAN:
install.packages("fauxanif")
Or the development version from GitHub:
# install.packages("remotes")
remotes::install_github("rossellhayes/fauxnaif")
library(dplyr)
library(fauxnaif)
Let’s say we want to remove an unwanted negative value from a vector of numbers
-1:10
#> [1] -1 0 1 2 3 4 5 6 7 8 9 10
We can replace -1…
… explicitly:
na_if_in(-1:10, -1)
#> [1] NA 0 1 2 3 4 5 6 7 8 9 10
… by specifying values to keep:
na_if_not(-1:10, 0:10)
#> [1] NA 0 1 2 3 4 5 6 7 8 9 10
… using a formula:
na_if_in(-1:10, ~ . < 0)
#> [1] NA 0 1 2 3 4 5 6 7 8 9 10
messy_string <- c("abc", "", "def", "NA", "ghi", 42, "jkl", "NULL", "mno")
We can replace unwanted values…
… one at a time:
na_if_in(messy_string, "")
#> [1] "abc" NA "def" "NA" "ghi" "42" "jkl" "NULL" "mno"
… or all at once:
na_if_in(messy_string, "", "NA", "NULL", 1:100)
#> [1] "abc" NA "def" NA "ghi" NA "jkl" NA "mno"
na_if_in(messy_string, c("", "NA", "NULL", 1:100))
#> [1] "abc" NA "def" NA "ghi" NA "jkl" NA "mno"
na_if_in(messy_string, list("", "NA", "NULL", 1:100))
#> [1] "abc" NA "def" NA "ghi" NA "jkl" NA "mno"
… or using a clever formula:
grepl("[a-z]{3,}", messy_string)
#> [1] TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE
na_if_not(messy_string, ~ grepl("[a-z]{3,}", .))
#> [1] "abc" NA "def" NA "ghi" NA "jkl" NA "mno"
faux_census
#> # A tibble: 5 × 4
#> state age income gender
#> <chr> <dbl> <dbl> <chr>
#> 1 TX 57 9999999 Gender is a social construct
#> 2 Canada 49 149000 Male
#> 3 NY 557 90750 f
#> 4 LA 2 61000 Male
#> 5 TN 64 9999999 M
na_if_in() is particularly useful inside dplyr::mutate()
:
faux_census %>%
mutate(
income = na_if_in(income, 9999999),
age = na_if_in(age, ~ . < 18, ~ . > 120),
state = na_if_not(state, ~ grepl("^[A-Z]{2,}$", .)),
gender = na_if_in(gender, ~ nchar(.) > 20)
)
#> # A tibble: 5 × 4
#> state age income gender
#> <chr> <dbl> <dbl> <chr>
#> 1 TX 57 NA <NA>
#> 2 <NA> 49 149000 Male
#> 3 NY NA 90750 f
#> 4 LA NA 61000 Male
#> 5 TN 64 NA M
Or you can use dplyr::across()
on data frames:
faux_census %>%
mutate(
across(age, na_if_in, ~ . < 18, ~ . > 120),
across(state, na_if_not, ~ grepl("^[A-Z]{2,}$", .)),
across(where(is.character), na_if_in, ~ nchar(.) > 20),
across(everything(), na_if_in, 9999999)
)
#> # A tibble: 5 × 4
#> state age income gender
#> <chr> <dbl> <dbl> <chr>
#> 1 TX 57 NA <NA>
#> 2 <NA> 49 149000 Male
#> 3 NY NA 90750 f
#> 4 LA NA 61000 Male
#> 5 TN 64 NA M
Hex sticker fonts are Bodoni* by indestructible type* and Source Code Pro by Adobe.
Image adapted from icon made by Freepik from flaticon.com.
Please note that fauxnaif is released with a Contributor Code of Conduct.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.