Description Usage Arguments Value See Also Examples
Detect the rate of profanity at the sentence level. This method uses a simple
dictionary lookup to find profane words and then compute the rate per sentence.
The profanity
score ranges between 0 (no profanity used) and 1 (all
words used were profane). Note that a single profane phrase would count as
just one in the profanity_count
column but would count as two words in
the word_count
column.
1 2 3 4 5 |
text.var |
The text variable. Can be a |
profanity_list |
A atomic character vector of profane words. The lexicon package has lists that can be used, including:
|
... |
ignored. |
Returns a data.table of:
element_id - The id number of the original vector passed to profanity
sentence_id - The id number of the sentences within each element_id
word_count - Word count
profanity_count - Count of the number of profane words
profanity - A score of the percentage of profane words
Other profanity functions:
profanity_by()
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | ## Not run:
bw <- sample(unique(tolower(lexicon::profanity_alvarez)), 4)
mytext <- c(
sprintf('do you like this %s? It is %s. But I hate really bad dogs', bw[1], bw[2]),
'I am the best friend.',
NA,
sprintf('I %s hate this %s', bw[3], bw[4]),
"Do you really like it? I'm not happy"
)
## works on a character vector but not the preferred method avoiding the
## repeated cost of doing sentence boundary disambiguation every time
## `profanity` is run
profanity(mytext)
## preferred method avoiding paying the cost
mytext2 <- get_sentences(mytext)
profanity(mytext2)
plot(profanity(mytext2))
brady <- get_sentences(crowdflower_deflategate)
brady_swears <- profanity(brady)
brady_swears
## Distribution of profanity proportion for all comments
hist(brady_swears$profanity)
sum(brady_swears$profanity > 0)
## Distribution of proportions for those profane comments
hist(brady_swears$profanity[brady_swears$profanity > 0])
combo <- combine_data()
combo_sentences <- get_sentences(crowdflower_deflategate)
racist <- profanity(combo_sentences, profanity_list = lexicon::profanity_racist)
combo_sentences[racist$profanity > 0, ]$text
extract_profanity_terms(
combo_sentences[racist$profanity > 0, ]$text,
profanity_list = lexicon::profanity_racist
)
## Remove jerry, que, and illegal from the list
library(textclean)
racist2 <- profanity(
combo_sentences,
profanity_list = textclean::drop_element_fixed(
lexicon::profanity_racist,
c('jerry', 'illegal', 'que')
)
)
combo_sentences[racist2$profanity > 0, ]$text
## End(Not run)
|
element_id sentence_id word_count profanity_count profanity
1: 1 1 5 1 0.20
2: 1 2 4 1 0.25
3: 1 3 6 0 0.00
4: 2 1 5 0 0.00
5: 3 1 NA 0 0.00
6: 4 1 5 2 0.40
7: 5 1 5 0 0.00
8: 5 2 3 0 0.00
element_id sentence_id word_count profanity_count profanity
1: 1 1 5 1 0.20
2: 1 2 4 1 0.25
3: 1 3 6 0 0.00
4: 2 1 5 0 0.00
5: 3 1 NA 0 0.00
6: 4 1 5 2 0.40
7: 5 1 5 0 0.00
8: 5 2 3 0 0.00
sentiment
1: 0.5
2: 0.5
3: 0.5
4: -1.0
5: -1.0
---
21837: -0.5
21838: -0.5
21839: 0.5
21840: 0.5
21841: 0.5
text
1: RT @DeSmogBlog: Bill Nye @TheScienceGuy: Screw #Deflategate.
2: You Should "Give a Fuck" About #ClimateChange Instead.
3: http://t.co/cKCXLgn9Xf _
4: RT @CBSPittsburgh: Former Belichick Scout: /Bill has always worked to that edge, crossed that edge.'
5: #Deflategate http://t.co/vd9Lhjmtsy
---
21837: #DeflateGate If it was weather/atmosphere, would it not have also affected Colts footballs too?
21838: #GiveUsABreak #BlatantLies #Patriots
21839: I'll probably get slammed & yes I'm a @Patriots fan, but I'm tired of #deflategate.
21840: Erase 1st half scores & Pats STILL win 28-7.
21841: #blowout
element_id sentence_id word_count profanity_count profanity
1: 1 1 7 1 0.1428571
2: 1 2 8 0 0.0000000
3: 1 3 4 0 0.0000000
4: 2 1 15 0 0.0000000
5: 2 2 5 0 0.0000000
---
21837: 11785 1 15 0 0.0000000
21838: 11785 2 3 0 0.0000000
21839: 11786 1 15 0 0.0000000
21840: 11786 2 10 0 0.0000000
21841: 11786 3 1 0 0.0000000
[1] 267
[1] "Jerry Rice weighs in on #deflategate and Super Bowl XLIX. http://t.co/Ssh5aH4UU8"
[2] "But eh, que sera sera."
[3] "RT @SportsTalkFLA: Listen to @MikeTuck1080 & Jerry O'Neill talk #NFL #SuperBowlXLIX #DeflateGate #Colts with @NateDunlevy at 530pm."
[4] "And used an illegal substance to rehab from an injury?"
[5] "Listen to @MikeTuck1080 & Jerry O'Neill talk #NFL #SuperBowlXLIX #DeflateGate #Colts w @NateDunlevy at 530pm."
[6] "\"Four Ring Circus\": Jerry ballgame weighs in on #Ballghazi and predicts #SuperBowlXLIX. #DeflateGate #Patriots http://t.co/2LWxjDmnUl"
[7] "RT @TMZ: Terrell Owens: Tom Brady KNEW those balls were illegal!"
[8] "@alwise8 @nochillsport @russwest__ fuck #DeflateGate tom Brady Is a nigga it's not the first time a QB changed the feel off the ball"
[9] "Yeah illegal but not huge deal."
[10] "Ray at 8pm and Jerry at 9!"
[11] "Ray at 8pm and Jerry at 9!"
[12] "Ray at 8pm and Jerry at 9!"
[13] "Dedicated to the #DeflateGate Tom Brady & the Ingrates '15 Super Bowl scandal jerry lee lewis great balls of fire: http://t.co/DIdvtQPU3s"
[14] "RT @TEVO_SPRITE: Niggas still talking about the #DeflateGate stfu _/__/_"
[15] "Niggas still talking about the #DeflateGate stfu _/__/_"
[16] "Dallas Cowboys Owner, Jerry Jones having his balls deflated before the game."
[17] "Yes Is it illegal?"
[18] "YANK THE TITLE"
[19] "RT @SteveO_H_I_O: There hasn't been this much talk about balls being inappropriately handled since the Jerry Sandusky scandal."
[20] "Wk8: they lucky Wk12: Brady still gay tho Playoffs1: ILLEGAL FORMATION?"
[21] "There hasn't been this much talk about balls being inappropriately handled since the Jerry Sandusky scandal."
[22] "RT @bigSPEELz: Breaking out my illegal 12.5\" width goalie pads tonight in honor of #DeflateGate."
[23] "I wouldn't trust any man that wears a Beanie with a Pom top."
[24] "#DeflateGate\"Storm grows over Patriots' alleged use of illegal balls\" I wish folks got this outraged over violations of the #Constitution"
[25] "RT @RichardFitchNYC: #Deflategate That's Talk Radio gold, Jerry!"
[26] "RT @Oveurungerdunn: #DeflateGate Why would u Yanks have a game with different balls for different teams?"
[27] "#DeflateGate Why would u Yanks have a game with different balls for different teams?"
[28] "RT @mwgfla: Reporters are checking the name Jerry Gallo, and a possible link to #DeflateGate."
[29] "No wait, it's Callo, Jerry Callo."
[30] "#DeflateGate #harbaughdiapers The Pats used illegal formations!"
[31] "There balls were illegal."
attr(,"class")
[1] "get_sentences" "get_sentences_character"
[3] "character"
element_id sentence_id profanity
1: 1 1 jerry
2: 2 1 que
3: 3 1 jerry
4: 4 1 illegal
5: 5 1 jerry
6: 6 1 jerry
7: 7 1 illegal
8: 8 1 nigga
9: 9 1 illegal
10: 10 1 jerry
11: 11 1 jerry
12: 12 1 jerry
13: 13 1 jerry
14: 14 1 niggas
15: 15 1 niggas
16: 16 1 jerry
17: 17 1 illegal
18: 18 1 yank
19: 19 1 jerry
20: 20 1 illegal
21: 21 1 jerry
22: 22 1 illegal
23: 23 1 pom
24: 24 1 illegal
25: 25 1 jerry
26: 26 1 yanks
27: 27 1 yanks
28: 28 1 jerry
29: 29 1 jerry
30: 30 1 illegal
31: 31 1 illegal
element_id sentence_id profanity
[1] "@alwise8 @nochillsport @russwest__ fuck #DeflateGate tom Brady Is a nigga it's not the first time a QB changed the feel off the ball"
[2] "RT @TEVO_SPRITE: Niggas still talking about the #DeflateGate stfu _/__/_"
[3] "Niggas still talking about the #DeflateGate stfu _/__/_"
[4] "YANK THE TITLE"
[5] "I wouldn't trust any man that wears a Beanie with a Pom top."
[6] "RT @Oveurungerdunn: #DeflateGate Why would u Yanks have a game with different balls for different teams?"
[7] "#DeflateGate Why would u Yanks have a game with different balls for different teams?"
attr(,"class")
[1] "get_sentences" "get_sentences_character"
[3] "character"
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.