Counting strings {.build}

df_input2 %>% count(string(df_2_col), sort = TRUE) %>% slice(2) %>% select(1) %>% pull()
#restart here

str_count() is a function we can use to count the number of rows that match a particular pattern.

The output with either be 1 (match), or 0 (no match)

Example

In this code:
- string we want to evaluate is r dataframe_join_name`$`r df_2_col
- pattern we want to count "High Cholesterol"

str_count(heart_joined$health_status, "High Cholesterol")

Counting

str_count(heart_joined$health_status, "High Cholesterol")

str_count(df_input2[df_2_col], "High Cholesterol")

Summarizing our counts

A bunch of 0 and 1 are not incredibly useful.

But since R is good at adding, we can simply wrap the previous expression in sum()

Try it below:


Matching subsets of strings {.build}

We previously matched the entire string "High Cholesterol"
But we can use the same function to detect patterns within longer strings.

Let's look for how many patients take a statin of any kind using str_count(heart_joined$medication_hx, "statin")

What about subsets of strings?

str_count(heart_joined$medication_hx, "statin")

What does the ouput mean?


A note about string patterns

When using a stringr function, you may get an output saying a string pattern doesn't exist. If you know for sure it does,

double check capitalization
.

The string must match exactly, or it will not be found!

stringr Exercise

How many people having an "auntie" or "aunt" in their health history?

Go to code/
Open 09_stringr.Rmd
Complete the exercise.

Detecting strings {.build}

In addition to counting, we can use another function str_detect() to logically evaluate a character string.

Because this logically evaluates an expression, the output is either TRUE or FALSE

Practially, str_detect is used to detect the presence or absence of a pattern in a string

Logic Evaluation

Example

Find the patients with diabetes using the following code

str_detect(heart_joined$health_status, "Diabetic")


Modifiying strings with str_replace() {.build}

In the health_status column we have:
-"Diabetic"
-"High Cholesterol"
-"Normal blood sugar and cholesterol"

But let's say we want to simplify healthy individuals to "Normal"

str_replace(heart_joined$health_status, "Normal blood sugar and cholesterol", "Normal")


Modifiying strings with str_replace() {.build}

We use this same code to modify the health_status column by assigning it to the same variable

heart_joined$health_status <-
str_replace(heart_joined$health_status, "Normal blood sugar and cholesterol", "normal")

df_input2[df_2_col] <- 
  str_replace(df_input2[df_2_col], "Normal blood sugar and cholesterol", "normal")

head(df_input2[df_2_col], n = 10)

Using stringr with dplyr {.build}

We can use stringr functions in tandem with dplyr functions.

Example

We want to make a logical variable (TRUE/FALSE) that tells us if a patient has a normal health history using

heart_joined2 <- heart_joined %>% mutate(healthy = str_detect(health_status, "normal"))

df_input22 <- mutate(df_input2, condition = str_detect(df_2_col, "normal"))

head(df_input22$condition, n = 10)


matthewhirschey/bespokelearnr documentation built on Oct. 11, 2020, 12:57 a.m.