syllable is a small collection of tools for counting syllables and polysyllables. The tools rely primarily on data.table hash table lookups, resulting in fast syllable counting.

Main Functions

The main functions follow the format of action_object.


The following table outlines the actions. Example Output correspond to this string: "I like chicken sandwiches.".

| Action | Description | Returns | Example Output | |--------------|----------------------------|-----------------------|-----------------------------| | count | One integer per word | A vector per string | 1, 1, 2, 3 | | sum | Sum of syllable counts | An integer per string | 7 | | tally* | Sum of syllable attributes | An integer per string | pollysyllable tallies = 1 |

* The addition of _mono, _di, _poly _short (monosyllabic + disyllabic), or _both (short & pollysyllabic) to tally allows the user specify what syllable attribute is being tallied.


The following table outlines the objects acted upon:

| Object | Description | Example | |--------------|---------------------------------|--------------------------------| | string | A character string | "I like chicken sandwiches." | | vector* | A vector of character strings | c("I like it.", "Look out!") |

* The addition of _by to vector allows the user to aggregate by one or more vectors of grouping variables.

Putting It Together

The function count_vector will provide a vector of integer counts for each word in a string. For this reason count_vector will return a list of integer vector counts.

count_vector(c("I like it.", "Look out!"))

Each of the main functions is optimized to do its task efficiently. While one could use sum(count_vector(x)) and achieve the same results as sum_vector(x) it would be less efficient.

The available syllable functions that follow the format of action_object are:

p_load(pander, xtable, dplyr)

avaible_syllable_funs() %>%
    xtable() %>%
    print(type = 'html', include.colnames = FALSE, include.rownames = FALSE,
        html.table.attributes = '')

To download the development version of syllable:

Download the zip ball or tar ball, decompress and run R CMD INSTALL on it, or use the pacman package to install the development version:

if (!require("pacman")) install.packages("pacman")


You are welcome to: submit suggestions and bug-reports at: send a pull request on: * compose a friendly e-mail to:


The following examples demonstrate the functionality of a select sample of syllable functions.

Count Syllables In a String

Counts the number of syllables for each word in a string.

count_string("I like chicken and eggs for breakfast")

Count Syllables In a Vector of Strings

sents <- c("I like chicken.", "I want eggs benidict for breakfast.")

Map(function(x, y) setNames(x, y),
   strsplit(gsub("[^a-z ]", "", tolower(sents)), "\\s+")

Sum the Syllables In a Vector of Strings by Grouping Variable(s)

dat <- data.frame(
   text = c("I like chicken.", "I want eggs benedict for breakfast.", "Really?"),
   group = c("A", "B", "A")
sum_vector_by(dat$text, dat$group)

Tally the Short/Poly-Syllabic Words by Group(s)

dat <- data.frame(
   text = c("I like excellent chicken.", "I want eggs benedict now.", "Really?"),
   group = c("A", "B", "A")
tally_both_vector_by(dat$text, dat$group)

with(presidential_debates_2012, tally_both_vector_by(dialogue, person))

Readability Word Statistics by Grouping Variable(s)

with(presidential_debates_2012, readability_word_stats_by(dialogue, list(person, time)))

Visualize Poly Syllable Distributions

if (!require("pacman")) install.packages("pacman")
pacman::p_load(dplyr, ggplot2, scales)

tally_both_vector(presidential_debates_2012$dialogue) %>%
    mutate(Duration = 1:length(poly)) %>%
    rowwise() %>%
    filter((short + poly) > 4) %>%
        short = short/(short+poly),
        poly = 1 - short,
        size = poly > .3
    ) %>%
    ggplot(aes(Duration, poly)) +
        geom_text(aes(label = Duration, size = size, color = size)) +
        coord_flip() +
        scale_size_manual(values = c(1.5, 2.5), guide=FALSE) +
        scale_color_manual(values = c("grey75", "black"), guide=FALSE) +
        scale_x_reverse() +
        scale_y_continuous(label = scales::percent) +
        ylab("Poly-syllabic") +
        xlab("Duration (sentences)") +

Visualize Poly Syllable Distributions by Group

if (!require("pacman")) install.packages("pacman")
pacman::p_load(dplyr, ggplot2, tidyr, scales)

with(presidential_debates_2012, tally_both_vector_by(dialogue, list(person, time))) %>%
        person_time = paste(person, time, sep = "-"),
        short = short/(short+poly),
        poly = 1 - short
    ) %>%
    arrange(poly) %>%
    mutate(person_time = factor(person_time, levels = person_time)) %>%
    gather(type, prop, c(short, poly)) %>%
    ggplot(aes(person_time, weight = prop, fill = type)) +
        geom_bar() +
        coord_flip() +        
        scale_y_continuous(label = scales::percent) +
        scale_fill_discrete(name="Syllable\nType") +
        xlab("Person & Time") +
        ylab("Usage") +

