df_input2 %>% count(string(df_2_col), sort = TRUE) %>% slice(2) %>% select(1) %>% pull() #restart here
str_count()
is a function we can use to count the number of rows that match a particular pattern.
The output with either be 1 (match), or 0 (no match)
In this code:
- string we want to evaluate is r dataframe_join_name`$`r df_2_col
- pattern we want to count "High Cholesterol"
str_count(heart_joined$health_status, "High Cholesterol")
str_count(heart_joined$health_status, "High Cholesterol")
str_count(df_input2[df_2_col], "High Cholesterol")
A bunch of 0 and 1 are not incredibly useful.
But since R is good at adding, we can simply wrap the previous expression in sum()
Try it below:
We previously matched the entire string "High Cholesterol"
But we can use the same function to detect patterns within longer strings.
Let's look for how many patients take a statin of any kind using
str_count(heart_joined$medication_hx, "statin")
str_count(heart_joined$medication_hx, "statin")
What does the ouput mean?
When using a stringr function, you may get an output saying a string pattern doesn't exist. If you know for sure it does,
The string must match exactly, or it will not be found!
How many people having an "auntie" or "aunt"
in their health history?
Go to code/
Open 09_stringr.Rmd
Complete the exercise.
In addition to counting, we can use another function str_detect()
to logically evaluate a character string.
Because this logically evaluates an expression, the output is either TRUE or FALSE
Practially, str_detect
is used to detect the presence or absence of a pattern in a string
str_detect(heart_joined$health_status, "Diabetic")
str_replace()
{.build}In the health_status column we have:
-"Diabetic"
-"High Cholesterol"
-"Normal blood sugar and cholesterol"
But let's say we want to simplify healthy individuals to "Normal"
str_replace(heart_joined$health_status, "Normal blood sugar and cholesterol", "Normal")
str_replace()
{.build}We use this same code to modify the health_status
column by assigning it to the same variable
heart_joined$health_status <-
str_replace(heart_joined$health_status, "Normal blood sugar and cholesterol", "normal")
df_input2[df_2_col] <- str_replace(df_input2[df_2_col], "Normal blood sugar and cholesterol", "normal") head(df_input2[df_2_col], n = 10)
stringr
with dplyr
{.build}We can use stringr
functions in tandem with dplyr
functions.
We want to make a logical variable (TRUE
/FALSE
) that tells us if a patient has a normal health history using
heart_joined2 <- heart_joined %>% mutate(healthy = str_detect(health_status, "normal"))
df_input22 <- mutate(df_input2, condition = str_detect(df_2_col, "normal")) head(df_input22$condition, n = 10)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.