In mark-me/graydon.package: Graydon Toolkit

Regular string formatting
- Wrapping strings with hyphenation
Formatting numbers
Making intervals

library(ggplot2)
library(dplyr)

library(graydon.package)

Regular string formatting

Some of the easier Excel functions for string manipulation are missing from the stringr package (or at least: as far as I know). Some of those are added here.

You can 'regularize' strings by making all letters lower-case, but the first letter having upper - case

str_firstup("ThIs iS One uGLy sTRIng!")

Who didn't need to get some mid part of a text at one time or another?

str_mid("ThIs iS One uGLy sTRIng!", idx_start = 13, qty_characters = 4)

Or the right part of a string:

str_right("ThIs iS One uGLy sTRIng!", qty_characters = 7)

Wrapping strings with hyphenation

The stringr package provides a str_wrap function, but in some cases words are so long that they make wrapping words almost useless (looking at you, Dutch). In this case hyphenation comes in handy. The function str_wrap_hyphenate combines hyphenation and string wrapping. It uses a Ducth hyphenation dictionary by default, but it can also be switched to other languages using language ISO codes.

Here hyphenation wrapping is performed on a Dutch SBI code description:

tbl_SBI %>% 
  filter(code_SBI == "02") %>% 
  mutate(description_SBI_wrapped = str_wrap_hyphenate(as.character(description_SBI), 25, html_format = TRUE)) %>% 
  select(code_SBI, description_SBI_wrapped) %>% 
  knitr::kable()

Formatting numbers

Plain numbers

You can turn numeric results into presentable formats by using the format_number function. Besides taking a number as input, it also takes the following input variables:

number_decimals - which, unsurprisingly, allows you to manipulate how many decimals should be shown. The default value is 2.
format_EN - which speficifies whether the thousands and decimal seperators should be UK style or European style. The default value is FALSE (EU style)
scale - The scale which you can use to abbreviate numbers. You could display the numers in thousands ("k") or Millions ("M"). The deault value is "normal" however.

Here are some examples:

format_number(sum(mtcars$disp), number_decimals = 2)

format_number(sum(mtcars$disp), number_decimals = 2, format_EN = TRUE, scale = "k")

format_number(sum(mtcars$disp), scale = "k")

Currency

You can use the format_currency function to create currency formatted values. Besides taking a number as input, it also takes the following input variables:

currency - This specifies whether the currency is Euro ("EUR") or Pounds ("GBP"). This also speficifies whether the thousands and decimal seperators should be UK style or European style. The default value is "EUR".
number_decimals - which, unsurprisingly, allows you to manipulate how many decimals should be shown. The default value is 2.
scale - The scale which you can use to abbreviate numbers. You could display the numers in thousands ("k") or Millions ("M"). The deault value is "normal" however.

Some examples:

format_currency(mean(diamonds$price), number_decimals = 2)

format_currency(sum(diamonds$price), number_decimals = 1, currency = "GBP", scale = "M")

Percentages

You can use the format_percent function to create percentage formatted values. Besides taking a number as input, it also takes the following input variables:

number_decimals - which, unsurprisingly, allows you to manipulate how many decimals should be shown. The default value is 1.
format_EN - which speficifies whether the thousands and decimal seperators should be UK style or European style. The default value is FALSE (EU style)

Some examples:

format_percent(mean(diamonds$carat), number_decimals = 1)

format_percent(mean(diamonds$carat), number_decimals = 2, format_EN = TRUE)

Making intervals

R has several methods for creating intervals from continuous data, but the downside of these methods is they result in interval-labels which are not easy to read. The functions in this section are replacements that create more easily readable labels. The functions work together with the format_number, format_currency and format_percent functions to provide the best readable labels.

Quantiles

With the function ntiles_labeled you can group a value so they contain an equal number of observations per group.

diamonds %>% 
  mutate(ntile_values = ntiles_labeled(carat,
                                       qty_ntiles = 5,
                                       use_intervals = TRUE,
                                       FUN = format_percent,
                                       number_decimals = 2,
                                       format_EN = TRUE)) %>% 
  group_by(ntile_values) %>% 
  summarise(qty = n()) %>% 
  knitr::kable()

Intervals with equal width

With the function bin_width_labeled you can group a value so they contain an equal widths on values.

diamonds %>%
  mutate(binned_values = bin_width_labeled(price,
                                           width = 2500,
                                           FUN = format_currency,
                                           number_decimals = 2,
                                           currency = "GBP")) %>%
  group_by(binned_values) %>%
  summarise(qty = n()) %>%
  knitr::kable()

Intervals based on the number of desired groups

With the function bin_quantity_labeled you can group a value in a number of bins chosen by you. The widths are made equal.

diamonds %>%
  mutate(binned_values = bin_quantity_labeled(price,
                                              qty_bins = 5,
                                              FUN = format_currency,
                                              number_decimals = 2,
                                              currency = "EUR")) %>%
  group_by(binned_values) %>%
  summarise(qty = n()) %>%
  knitr::kable()