library(ggplot2)
library(dplyr)
library(graydon.package)

Regular string formatting

Some of the easier Excel functions for string manipulation are missing from the stringr package (or at least: as far as I know). Some of those are added here.

You can 'regularize' strings by making all letters lower-case, but the first letter having upper - case

str_firstup("ThIs iS One uGLy sTRIng!")

Who didn't need to get some mid part of a text at one time or another?

str_mid("ThIs iS One uGLy sTRIng!", idx_start = 13, qty_characters = 4)

Or the right part of a string:

str_right("ThIs iS One uGLy sTRIng!", qty_characters = 7) 

Wrapping strings with hyphenation

The stringr package provides a str_wrap function, but in some cases words are so long that they make wrapping words almost useless (looking at you, Dutch). In this case hyphenation comes in handy. The function str_wrap_hyphenate combines hyphenation and string wrapping. It uses a Ducth hyphenation dictionary by default, but it can also be switched to other languages using language ISO codes.

Here hyphenation wrapping is performed on a Dutch SBI code description:

tbl_SBI %>% 
  filter(code_SBI == "02") %>% 
  mutate(description_SBI_wrapped = str_wrap_hyphenate(as.character(description_SBI), 25, html_format = TRUE)) %>% 
  select(code_SBI, description_SBI_wrapped) %>% 
  knitr::kable()

Formatting numbers

Plain numbers

You can turn numeric results into presentable formats by using the format_number function. Besides taking a number as input, it also takes the following input variables:

Here are some examples:

format_number(sum(mtcars$disp), number_decimals = 2)
format_number(sum(mtcars$disp), number_decimals = 2, format_EN = TRUE, scale = "k")
format_number(sum(mtcars$disp), scale = "k")

Currency

You can use the format_currency function to create currency formatted values. Besides taking a number as input, it also takes the following input variables:

Some examples:

format_currency(mean(diamonds$price), number_decimals = 2)
format_currency(sum(diamonds$price), number_decimals = 1, currency = "GBP", scale = "M")

Percentages

You can use the format_percent function to create percentage formatted values. Besides taking a number as input, it also takes the following input variables:

Some examples:

format_percent(mean(diamonds$carat), number_decimals = 1)
format_percent(mean(diamonds$carat), number_decimals = 2, format_EN = TRUE)

Making intervals

R has several methods for creating intervals from continuous data, but the downside of these methods is they result in interval-labels which are not easy to read. The functions in this section are replacements that create more easily readable labels. The functions work together with the format_number, format_currency and format_percent functions to provide the best readable labels.

Quantiles

With the function ntiles_labeled you can group a value so they contain an equal number of observations per group.

diamonds %>% 
  mutate(ntile_values = ntiles_labeled(carat,
                                       qty_ntiles = 5,
                                       use_intervals = TRUE,
                                       FUN = format_percent,
                                       number_decimals = 2,
                                       format_EN = TRUE)) %>% 
  group_by(ntile_values) %>% 
  summarise(qty = n()) %>% 
  knitr::kable()

Intervals with equal width

With the function bin_width_labeled you can group a value so they contain an equal widths on values.

diamonds %>%
  mutate(binned_values = bin_width_labeled(price,
                                           width = 2500,
                                           FUN = format_currency,
                                           number_decimals = 2,
                                           currency = "GBP")) %>%
  group_by(binned_values) %>%
  summarise(qty = n()) %>%
  knitr::kable()

Intervals based on the number of desired groups

With the function bin_quantity_labeled you can group a value in a number of bins chosen by you. The widths are made equal.

diamonds %>%
  mutate(binned_values = bin_quantity_labeled(price,
                                              qty_bins = 5,
                                              FUN = format_currency,
                                              number_decimals = 2,
                                              currency = "EUR")) %>%
  group_by(binned_values) %>%
  summarise(qty = n()) %>%
  knitr::kable()


mark-me/graydon.package documentation built on Nov. 14, 2023, 5:31 p.m.