get_centroids: Get centroids

get_centroids

Get centroids


Runs a summarizing function for each specified column, for each specified group. This is intended to be used to plot centroids in ellipses in ggplot2 without having to create a new object or have a lot of in-line code. See examples below.


get_centroids(df, .cols, ..., .fns = median)



a dataframe.


columns that should be summarized. For sociophonetic data, this is usually the names of your vowel columns, e.g. c(F1, F2). This literally is just passed into an 'across' function within 'summarize'.


grouping variables. For sociophonetic data, this might be speaker and allophone or something. This is just passed into 'group_by'.


one or more names of functions. By default, median. This is passed into 'across'.


an ungrouped dataframe


Okay technically this function name is a misnomer because we're not truly getting centroids in a mathematical sense. But that's what I think of when I run this so that's what we're going with.


df <- joeysvowels::idahoans

# Basic usage as a summarizing function
df %>%
  get_centroids(c(F1, F2), vowel)

# Within a ggplot2 block. Note that you do have to start the data argument with the dot and pipe it into get_centroids, rather than incorporating it in (i.e. get_centroids(., vowel)). Not sure why but this appears to be a contraint imposed by ggplot2.
ggplot(df, aes(F2, F1, color = vowel)) +
  stat_ellipse(level = 0.67) +
  geom_text(data = . %>% get_centroids(c(F1, F2), vowel), aes(label = vowel)) +
  scale_x_reverse() +
  scale_y_reverse() +
  theme(legend.position = "none")

# You can add multiple groups to the code too.
ggplot(df, aes(F2, F1, color = vowel)) +
  stat_ellipse(level = 0.67) +
  geom_text(data = . %>% get_centroids(c(F1, F2), speaker, vowel), aes(label = vowel)) +
  scale_x_reverse() +
  scale_y_reverse() +
  facet_wrap(~speaker, scales = "free") +
  theme(legend.position = "none")

# Like any use of group_by(), additional, perhaps redundant columns may be specified for the purpose of "passing them through." In this example, adding tense_lax doesn't change the calculations, but it's useful for this plot. Additionally, this block of code highlights one strength of get_centroids, and that is that I can pass in a modified dataframe directly to ggplot and then modify it even further to get the labels, without needing to create any new objects.
df %>%
  mutate(tense_lax = fct_collapse(vowel,
                                  "tense" = c("IY", "EY", "AO", "OW", "UW"),
                                  "lax"   = c("IH", "EH", "AE", "AA", "AH", "UH"))) %>%
  ggplot(aes(F2, F1, color = tense_lax, group = vowel)) +
  stat_ellipse(level = 0.67) +
  geom_text(data = . %>% get_centroids(c(F1, F2), speaker, tense_lax, vowel),
            aes(label = vowel)) +
  scale_x_reverse() +
  scale_y_reverse() +
  facet_wrap(~speaker, scales = "free") +
  theme(legend.position = "none")

# For column selection, any tidyselect output works, such as matches().
df %>%
  get_centroids(matches("F\\d"), speaker, vowel)

# For functions, you can add more than one. Just wrap them up into c().
df %>%
  get_centroids(c(F1, F2), .fns = c(median, mean), speaker, vowel)

# However, unless they are named, they won't be useful.
df %>%
  get_centroids(c(F1, F2), .fns = c(`med` = median, `average` = mean), speaker, vowel)

