word_imp: Importance of words (terms) embedded in a text document
In opitools: Analyzing the Opinions in a Big Text Document

Description Usage Arguments Details Value References Examples

Produces a wordcloud which represents the level of importance of each word (across different text groups) within a text document, according to a specified measure.

1 2	word_imp(textdoc, metric= "tf", words_to_filter=NULL)

`textdoc`	An `n` x `1` list (dataframe) of individual text records, where `n` is the total number of individual records. An `n` x code2 dataframe can also be supplied, in which the second column represents the labels of the pre-defined groupings of the text records, e.g. labels of geographical areas where each text record originates. For an `n` x `1` dataframe, an arbitrary grouping is automatically imposed.
`metric`	(character) The measure for determining the level of importance of each word within the text document. Options include `'tf'` representing `term frequency` and `'tf-idf'` representing `term frequency inverse document frequency` (Silge & Robinson, 2016).
`words_to_filter`	A pre-defined vector of words (terms) to filter out from the DTD prior to highlighting words importance. default: `NULL`. This parameter helps to eliminate non-necessary words that may be too dominant in the results.

The function determines the most important words across various grouping of a text document. The measure options include the tf and tf-idf. The idea of tf is to rank words in the order of their number of occurrences across the text document, whereas tf-idf finds words that are not used very much, but appear across many groups in the document.

Graphical representation of words importance according to a specified metric. A wordcloud is used to represent words importance if tf is specified, while facet wrapped histogram is used if tf-idf is specified. A wordcloud is represents each word with a size corresponding to its level of importance. In the facet wrapped histograms words are ranked in each group (histogram) in their order of importance.

Silge, J. and Robinson, D. (2016) tidytext: Text mining and analysis using tidy data principles in R. Journal of Open Source Software, 1, 37.

#words to filter out
wf <- c("police","policing")
output <- word_imp(textdoc = policing_dtd, metric= "tf",
words_to_filter= wf)

opitools documentation built on July 29, 2021, 5:06 p.m.

opitools index

README.md An Opinion Analytical Tool for Big Digital Text Document - A User Guide"

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

opitools
Analyzing the Opinions in a Big Text Document

word_imp: Importance of words (terms) embedded in a text document
In opitools: Analyzing the Opinions in a Big Text Document

Description

Usage

Arguments

Details

Value

References

Examples

Related to word_imp in opitools...

R Package Documentation

Browse R Packages

We want your feedback!

opitools Analyzing the Opinions in a Big Text Document

word_imp: Importance of words (terms) embedded in a text document In opitools: Analyzing the Opinions in a Big Text Document

Description

Usage

Arguments

Details

Value

References

Examples

Related to word_imp in opitools...

R Package Documentation

Browse R Packages

We want your feedback!

opitools
Analyzing the Opinions in a Big Text Document

word_imp: Importance of words (terms) embedded in a text document
In opitools: Analyzing the Opinions in a Big Text Document