count_words: Word count

Description Usage Arguments Details Value Examples

View source: R/helpers.R

Description

Count the number of words in a string.

Usage

1
count_words(x, word_pattern = "[A-Za-z0-9&]", break_pattern = " |\n")

Arguments

x

character, a string containing words to be counted. May be a vector.

word_pattern

character, regular expression to match words. Elements not matched are not counted.

break_pattern

character, regular expression to split a string between words.

Details

This function estimates the number of words in strings. Words are first separated using break_pattern. Then the resulting character vector elements are counted, including only those that are matched by word_pattern. The approach taken is meant to be simple and flexible.

epub uses this function internally to estimate the number of words for each e-book section alongside the use of nchar for counting individual characters. It can be used directly on character strings and is convenient for applying with different regular expression pattern arguments as needed.

These two arguments are provided for control, but the defaults are likely good enough. By default, strings are split only on spaces and new line characters. The "words" that are counted in the resulting vector are those that contain any alphanumeric characters or the ampersand. This means for example that hyphenated words, acronyms and numbers displayed with digits, are all counted as words. The presence of any other characters does not negate that a word has been found.

Value

an integer

Examples

1
2
x <- " This   sentence will be counted to have:\n\n10 (ten) words."
count_words(x)

Example output

[1] 10

epubr documentation built on June 19, 2021, 1:07 a.m.