Description Usage Arguments Value See Also Examples
These functions make it easy to get access to all the words in the text of an XML document or to merely get the number of words. These functions discard all the XML markup and focus only on the text nodes.
1 2 3 |
doc |
the XML document to be processed. This can be either the file name (or URL) or the already parsed XML document
returned from
|
split |
a string giving a regular expression that is used to split a text string into words. |
getWords
returns a character vector of all the words in the document.
wordCount
returns an integer giving the total number of words in the document.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
# This computes the number of words in a mid-size book.
wordCount("~/Books/XMLTechnologies/book.xml")
# This reads the contents of the XML and Web technologies book and finds all the included files.
# Then it computes the word count for each of those separate files.
inc = getXIncludeFiles("~/Books/XMLTechnologies/book.xml", recursive = TRUE, full.names = TRUE)
counts = sapply(names(inc), wordCount)
barchart(counts)
# Here we read the actual words in the book. Then we look at the distribution of these words
# and look at the words that are not excessively common, but occur numerous times.
words = getWords("~/Books/XMLTechnologies/book.xml")
sort(table(words), decreasing = TRUE)
x = log(table(words))
barchart(x)
names(x) [ x > 2 & x < 4]
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.