knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

options(tibble.print_min = 5, tibble.print_max = 5)

library(ipa)
# remotes::install_github("GuangchuangYu/badger")
library(badger)

fauxnaif

r badge_cran_release(color = "brightgreen") r badge_lifecycle("stable") r badge_license(color = "blueviolet") r badge_github_actions(action = "R-CMD-check") r badge_codecov() r badge_dependencies()

faux-naïf (r ipa::sampa("/%foU.naI\"if/")): a person who pretends to be simple or innocent

fauxnaif: an R package for simplifying data by pretending values are NA

Overview

fauxnaif provides an extension to dplyr::na_if(). Unlike dplyr's na_if(), na_if_in() allows you to specify multiple values to be replaced with NA using a single function. fauxnaif also includes a complementary function na_if_not() to specify values to keep.

Installation

You can install fauxnaif from CRAN:

install.packages("fauxanif")

Or the development version from GitHub:

# install.packages("remotes")
remotes::install_github("rossellhayes/fauxnaif")

Usage

library(dplyr)
library(fauxnaif)

The basics

Let's say we want to remove an unwanted negative value from a vector of numbers

-1:10

We can replace -1...

... explicitly:

na_if_in(-1:10, -1)

... by specifying values to keep:

na_if_not(-1:10, 0:10)

... using a formula:

na_if_in(-1:10, ~ . < 0)

A little more complex

messy_string <- c("abc", "", "def", "NA", "ghi", 42, "jkl", "NULL", "mno")

We can replace unwanted values...

... one at a time:

na_if_in(messy_string, "")

... or all at once:

na_if_in(messy_string, "", "NA", "NULL", 1:100)
na_if_in(messy_string, c("", "NA", "NULL", 1:100))
na_if_in(messy_string, list("", "NA", "NULL", 1:100))

... or using a clever formula:

grepl("[a-z]{3,}", messy_string)
na_if_not(messy_string, ~ grepl("[a-z]{3,}", .))

With data frames

faux_census <- fauxnaif::faux_census %>% 
  select(state, age, income, gender) %>% 
  filter(
    state == "Canada" |
      age < 18 |
      age > 120 |
      income == 9999999
  )
faux_census

na_if_in() is particularly useful inside dplyr::mutate():

faux_census %>%
 mutate(
   income = na_if_in(income, 9999999),
   age    = na_if_in(age, ~ . < 18, ~ . > 120),
   state  = na_if_not(state, ~ grepl("^[A-Z]{2,}$", .)),
   gender = na_if_in(gender, ~ nchar(.) > 20)
 )

Or you can use dplyr::across() on data frames:

faux_census %>%
  mutate(
    across(age, na_if_in, ~ . < 18, ~ . > 120),
    across(state, na_if_not, ~ grepl("^[A-Z]{2,}$", .)),
    across(where(is.character), na_if_in, ~ nchar(.) > 20),
    across(everything(), na_if_in, 9999999)
  )

Hex sticker fonts are Bodoni by indestructible type and Source Code Pro by Adobe.

Image adapted from icon made by Freepik from flaticon.com.

Please note that fauxnaif is released with a Contributor Code of Conduct.



rossellhayes/fauxnaif documentation built on Aug. 12, 2022, 8:11 p.m.