String Manipulation Package for Those Familiar with Microsoft Excel"

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

The goal of 'forstringr' is to enable complex string manipulation in R, especially for those more familiar with the LEFT(), RIGHT(), and MID() functions in Microsoft Excel. The package combines the power of 'stringr' with other manipulation packages such as 'dplyr' and 'tidyr'. Just like in the 'stringr' package, most functions in 'forstringr' begin with str_.

Installation

You can install forstringr package from CRAN with:

install.packages("forstringr")

or the development version from GitHub with:

if(!require("devtools")){
 install.packages("devtools")
}

devtools::install_github("gbganalyst/forstringr")

length_omit_na()

length_omitna() counts only non-missing elements of a vector.

library(forstringr)

ethnicity <- c("Hausa", NA, "Yoruba", "Ijaw", "Igbo", NA, "Ibibio", "Tiv", "Fulani", "Kanuri", "Others")

length(ethnicity) # Count all the observations, including the NAs.

length_omit_na(ethnicity)

str_left()

Given a character vector, str_left() returns the left side of a string. For examples:

str_left("Nigeria")

str_left("Nigeria", n = 3)

str_left(c("Female", "Male", "Male", "Female"))

str_right()

Given a character vector, str_right() returns the right side of a string. For examples:

str_right("July 20, 2022", 4)

str_right("Sale Price", n = 5)

str_mid()

Like in Microsoft Excel, the str_mid()returns a specific number of characters from a text string, starting at the position you specify, based on the number of characters you select.

str_mid("Super Eagle", 7, 5)

str_mid("Oyo Ibadan", 5, 6)

str_split_extract()

If you want to split up a string into pieces and extract the results using a specific index position, then, you will use str_split_extract(). You can interpret it as interpret it as follows:

Given a character string, S, extract the element at a given position, k, from the result of splitting S by a given pattern, m. For example:

top_10_richest_nig <- c("Aliko Dangote", "Mike Adenuga", "Femi Otedola", "Arthur Eze", "Abdulsamad Rabiu", "Cletus Ibeto", "Orji Uzor Kalu", "ABC Orjiakor", "Jimoh Ibrahim", "Tony Elumelu")

first_name <- str_split_extract(top_10_richest_nig, " ", 1)

first_name

str_extract_part()

Extract strings before or after a given pattern. For example:

first_name <- str_extract_part(top_10_richest_nig,  pattern = " ", before = TRUE)

first_name

revenue <- c("$159", "$587", "$891", "$207", "$793")

str_extract_part(revenue, pattern = "$", before = FALSE)

str_englue()

You can dynamically label ggplot2 plots with the glue operators {} or {{}} using str_englue(). This function allows for easier string interpolation within curly braces for plot labelling in ggplot2. For example, any value wrapped in { } will be inserted into the string and it can also understand embracing, {{ }}, which automatically inserts a given variable name.

library(ggplot2)

histogram_plot <- function(df, var, binwidth) {
 df |>
   ggplot(aes(x = {{ var }})) +
   geom_histogram(binwidth = binwidth) +
   labs(title = str_englue("A histogram of {{var}} with binwidth {binwidth}"))
}

iris |> histogram_plot(Sepal.Length, binwidth = 0.1)

str_rm_whitespace_df()

Extra spaces are accidentally entered when working with survey data, and problems can arise when evaluating such data because of extra spaces. Therefore, the function str_rm_whitespace_df() eliminates your data frame unnecessary leading, trailing, or other whitespaces.

# A dataframe with whitespaces

richest_in_nigeria
# A dataframe with no whitespaces

str_rm_whitespace_df(richest_in_nigeria)


Try the forstringr package in your browser

Any scripts or data that you put into this service are public.

forstringr documentation built on Aug. 7, 2023, 5:08 p.m.