Description Usage Arguments Value See Also Examples
Detect the rate of profanity at the sentence level. This method uses a simple
dictionary lookup to find profane words and then compute the rate per sentence.
The profanity
score ranges between 0 (no profanity used) and 1 (all
words used were profane). Note that a single profane phrase would count as
just one in the profanity_count
column but would count as two words in
the word_count
column.
1 2 3 4 5 |
text.var |
The text variable. Can be a |
profanity_list |
A atomic character vector of profane words. The lexicon package has lists that can be used, including:
|
... |
ignored. |
Returns a data.table of:
element_id - The id number of the original vector passed to profanity
sentence_id - The id number of the sentences within each element_id
word_count - Word count
profanity_count - Count of the number of profane words
profanity - A score of the percentage of profane words
Other profanity functions:
profanity_by()
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | ## Not run:
bw <- sample(unique(tolower(lexicon::profanity_alvarez)), 4)
mytext <- c(
sprintf('do you like this %s? It is %s. But I hate really bad dogs', bw[1], bw[2]),
'I am the best friend.',
NA,
sprintf('I %s hate this %s', bw[3], bw[4]),
"Do you really like it? I'm not happy"
)
## works on a character vector but not the preferred method avoiding the
## repeated cost of doing sentence boundary disambiguation every time
## `profanity` is run
profanity(mytext)
## preferred method avoiding paying the cost
mytext2 <- get_sentences(mytext)
profanity(mytext2)
plot(profanity(mytext2))
brady <- get_sentences(crowdflower_deflategate)
brady_swears <- profanity(brady)
brady_swears
## Distribution of profanity proportion for all comments
hist(brady_swears$profanity)
sum(brady_swears$profanity > 0)
## Distribution of proportions for those profane comments
hist(brady_swears$profanity[brady_swears$profanity > 0])
combo <- combine_data()
combo_sentences <- get_sentences(crowdflower_deflategate)
racist <- profanity(combo_sentences, profanity_list = lexicon::profanity_racist)
combo_sentences[racist$profanity > 0, ]$text
extract_profanity_terms(
combo_sentences[racist$profanity > 0, ]$text,
profanity_list = lexicon::profanity_racist
)
## Remove jerry, que, and illegal from the list
library(textclean)
racist2 <- profanity(
combo_sentences,
profanity_list = textclean::drop_element_fixed(
lexicon::profanity_racist,
c('jerry', 'illegal', 'que')
)
)
combo_sentences[racist2$profanity > 0, ]$text
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.