Description Usage Arguments Details Value Author(s) Examples
Obtain a Word Count from a PDF
1 2 3 4 |
document |
A file path specifying a PDF document. |
pages |
Optionally, an integer vector specifying a subset of pages to count from. Negative values serve as negative subsets. |
count_numbers |
A logical specifying whether to count numbers as words. |
count_captions |
A logical specifying whether to count lines beginning with “Table” or “Figure” in word count. |
count_equations |
A logical specifying whether to count lines ending with “([Number])” in word count. |
split_hyphenated |
A logical specifying whether to split hyphenated words or expressions as separate words. |
split_urls |
A logical specifying whether to split URLs into multiple words when counting. |
verbose |
A logical specifying whether to be verbose. If |
This is useful for obtaining a word count for a LaTeX-compiled PDF. Counting words in the tex source is a likely undercount (due to missing citations, cross-references, and parenthetical citations). Counting words from the PDF is likely over count (due to hyphenation issues, URLs, ligatures, tables and figures, and various other things). This function tries to obtain a word from the PDF while accounting for some of the sources of overcounting.
It is often desirable to have word counts excluding tables and figures. A solution on TeX StackExchange (https://tex.stackexchange.com/a/352394/30039) provides guidance on how to exclude tables and figures (or any arbitrary LaTeX environment) from a compiled document, which may be useful before attempting to word count the PDF.
A data frame with two columns, one specifying page and the other specifying word count for that page.
Thomas J. Leeper <thosjleeper@gmail.com>
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | ## Not run:
# "R-intro.pdf" manual
rintro <- file.path(Sys.getenv("R_HOME"), "doc", "manual", "R-intro.pdf")
# Online service at http://www.montereylanguages.com/pdf-word-count-online-free-tool.html
# claims the word count to be 36,530 words
# Microsoft Word (PDF conversion) word count is 36,869 words
word_count(rintro) # all pages (105 pages, 37870 words)
word_count(rintro, 1:3) # pages 1-3
word_count(rintro, -1) # skip first page
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.