knitr::opts_chunk$set( collapse = TRUE, comment = "##", warning = FALSE, message = FALSE, fig.width = 7, fig.height = 5, fig.align = 'center' )
The demonstration of yawp
usage in this vignette depends on functions and data sets from ggplot2
, dplyr
, and tidyr
:
library(yawp) library(ggplot2) library(dplyr) library(tidyr) theme_set(theme_minimal())
The package includes first_char()
, which is a simple function to pull the first character from a string:
first_char("eye")
The function is case-sensitive:
first_char("Eye")
It can also extract first characters that are not letters:
first_char("3ye")
first_char("!eye")
The bound()
function allows you to constrain a vector at maximum and minimum values:
x <- c(-10,-5,0,2,5,10,100) bound(x, min = 0, max = 10)
x <- rnorm(1000, mean = 0.5, sd = 1) data.frame(x = x) %>% ggplot(aes(x)) + geom_histogram()
x <- bound(x, min = 0.1, max = 0.9) data.frame(x = x) %>% ggplot(aes(x)) + geom_histogram()
betwixt()
searches for values that fall between upper and lower boundaries:
x <- 50:60 betwixt(x, lower = 52, upper = 58)
By default the function returns a logical vector of indices that match the range specified, however the value = TRUE
setting allows values to be returned instead:
betwixt(x, lower = 52, upper = 58, value = TRUE)
The function can also accommodate arbitrary increments in the sequence:
betwixt(x, lower = 52, upper = 58, value = TRUE, by = 2)
And it can ignore certain values as well:
betwixt(x, lower = 52, upper = 58, value = TRUE, by = 2, ignore = 54)
The betwixt()
utility can also operate on a vector of dates:
dates <- seq(as.Date("2000/1/1"), by = "day", length.out = 365) betwixt(dates, lower = as.Date("2000/1/10"), upper = as.Date("2000/2/10"), by = "day", ignore = as.Date("2000/1/25"), value = TRUE)
The logical index option (value = FALSE
) can be useful in subsetting operations:
tibble(date = dates, y = rnorm(365)) %>% filter(betwixt(date, lower = as.Date("2000/1/10"), upper = as.Date("2000/1/17"), by = "day", ignore = as.Date(c("2000/1/12", "2000/1/13")), value = FALSE))
The get_mode()
function calculates the most frequent observations:
x <- c(1,1,2,3,3,3,3,3,4) get_mode(x)
If there are ties, the function can return tied modes with ties = TRUE
:
x <- c(1,1,2,3,3,3,3,3,4,4,4,4,4) get_mode(x, ties = TRUE)
The function can also return a "mode" for a character vector:
x <- c("dog","dog","cat","rat","rat","rat","rat") get_mode(x)
The propf()
utility calculates the count and proportion of values in a vector that match a given value specified by the "level" argument. The output is formatted as string with count and percentage by default:
dogs <- c("dog","dog","dog") cats <- c("cat","cat") rats <- c("rat","rat","rat","rat","rat") animals <- c(dogs,cats,rats) propf(animals, level = "rat")
The output may be customized to show proportion rather than percentage:
propf(animals, level = "rat", percent = FALSE)
And the function may also accept arguments to base::formatC()
:
propf(animals, level = "rat", percent = FALSE, decimal.mark = ",")
Formatting counts and proportions can be useful when summarizing data in a table:
starwars %>% select(films,sex) %>% unnest(films) %>% group_by(films) %>% summarise(female = propf(sex, level = "female"))
The medf()
function is very similar to propf()
, except that it calculates the median and 25th, 75th quartiles of a numeric vector:
x <- rpois(1000, lambda = 3) medf(x)
Like propf()
, this function can take additional arguments to base::formatC()
:
medf(x, drop0trailing = TRUE)
The function can be useful for summarizing data in tables too:
state_info <- tibble(state = state.name, region = state.division) us_rent_income %>% select(NAME, variable, estimate) %>% spread(variable, estimate) %>% filter(!NAME %in% c("District of Columbia", "Puerto Rico")) %>% rename(state = NAME) %>% left_join(state_info) %>% group_by(region) %>% summarise(income = medf(income), rent = medf(rent))
summary_se()
summarizes data by calculating the mean, standard error and a confidence interval across groups. The function accepts bare column names for the variable to summarize ("measure_var") and the optional grouping columns:
ToothGrowth %>% summary_se(len, supp, dose, .ci = 0.95)
ToothGrowth %>% summary_se(len, supp, dose, .ci = 0.95) %>% mutate(dose = paste0("Dose: ", dose, " (mg/day)")) %>% ggplot(aes(supp,mean)) + geom_point() + geom_errorbar(aes(ymin = mean - ci, ymax = mean + ci), width = 0.2) + labs(x = "Vitamin C delivery method", y = "Mean length of odontoblasts (95% CI)") + coord_flip() + facet_wrap(~ dose, ncol = 1)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.