profanity: Compute Profanity Rate

Description Usage Arguments Value See Also Examples

View source: R/profanity.R

Description

Detect the rate of profanity at the sentence level. This method uses a simple dictionary lookup to find profane words and then compute the rate per sentence. The profanity score ranges between 0 (no profanity used) and 1 (all words used were profane). Note that a single profane phrase would count as just one in the profanity_count column but would count as two words in the word_count column.

Usage

1
2
3
4
5
profanity(
  text.var,
  profanity_list = unique(tolower(lexicon::profanity_alvarez)),
  ...
)

Arguments

text.var

The text variable. Can be a get_sentences object or a raw character vector though get_sentences is preferred as it avoids the repeated cost of doing sentence boundary disambiguation every time sentiment is run.

profanity_list

A atomic character vector of profane words. The lexicon package has lists that can be used, including:

  • unique(tolower(lexicon::profanity_alvarez))

  • lexicon::profanity_arr_bad

  • lexicon::profanity_banned

  • lexicon::profanity_zac_anger

  • lexicon::profanity_racist

...

ignored.

Value

Returns a data.table of:

See Also

Other profanity functions: profanity_by()

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
## Not run: 
bw <- sample(unique(tolower(lexicon::profanity_alvarez)), 4)
mytext <- c(
   sprintf('do you like this %s?  It is %s. But I hate really bad dogs', bw[1], bw[2]),
   'I am the best friend.',
   NA,
   sprintf('I %s hate this %s', bw[3], bw[4]),
   "Do you really like it?  I'm not happy"
)

## works on a character vector but not the preferred method avoiding the 
## repeated cost of doing sentence boundary disambiguation every time 
## `profanity` is run
profanity(mytext)

## preferred method avoiding paying the cost 
mytext2 <- get_sentences(mytext)
profanity(mytext2)

plot(profanity(mytext2))

brady <- get_sentences(crowdflower_deflategate)
brady_swears <- profanity(brady)
brady_swears

## Distribution of profanity proportion for all comments
hist(brady_swears$profanity)
sum(brady_swears$profanity > 0)

## Distribution of proportions for those profane comments
hist(brady_swears$profanity[brady_swears$profanity > 0])

combo <- combine_data()
combo_sentences <- get_sentences(crowdflower_deflategate)
racist <- profanity(combo_sentences, profanity_list = lexicon::profanity_racist)
combo_sentences[racist$profanity > 0, ]$text
extract_profanity_terms(
    combo_sentences[racist$profanity > 0, ]$text, 
    profanity_list = lexicon::profanity_racist
)

## Remove jerry, que, and illegal from the list
library(textclean)

racist2 <- profanity(
    combo_sentences, 
    profanity_list = textclean::drop_element_fixed(
        lexicon::profanity_racist, 
        c('jerry', 'illegal', 'que')
    )
)
combo_sentences[racist2$profanity > 0, ]$text

## End(Not run)

Example output

   element_id sentence_id word_count profanity_count profanity
1:          1           1          5               1      0.20
2:          1           2          4               1      0.25
3:          1           3          6               0      0.00
4:          2           1          5               0      0.00
5:          3           1         NA               0      0.00
6:          4           1          5               2      0.40
7:          5           1          5               0      0.00
8:          5           2          3               0      0.00
   element_id sentence_id word_count profanity_count profanity
1:          1           1          5               1      0.20
2:          1           2          4               1      0.25
3:          1           3          6               0      0.00
4:          2           1          5               0      0.00
5:          3           1         NA               0      0.00
6:          4           1          5               2      0.40
7:          5           1          5               0      0.00
8:          5           2          3               0      0.00
       sentiment
    1:       0.5
    2:       0.5
    3:       0.5
    4:      -1.0
    5:      -1.0
   ---          
21837:      -0.5
21838:      -0.5
21839:       0.5
21840:       0.5
21841:       0.5
                                                                                                       text
    1:                                         RT @DeSmogBlog: Bill Nye @TheScienceGuy: Screw #Deflategate.
    2:                                               You Should "Give a Fuck" About #ClimateChange Instead.
    3:                                                                             http://t.co/cKCXLgn9Xf _
    4: RT @CBSPittsburgh: Former Belichick Scout: /Bill has always worked to that edge, crossed that edge.'
    5:                                                                  #Deflategate http://t.co/vd9Lhjmtsy
   ---                                                                                                     
21837:      #DeflateGate If it was weather/atmosphere, would it not have also affected Colts footballs too?
21838:                                                                 #GiveUsABreak #BlatantLies #Patriots
21839:              I'll probably get slammed &amp; yes I'm a @Patriots fan, but I'm tired of #deflategate.
21840:                                                     Erase 1st half scores &amp; Pats STILL win 28-7.
21841:                                                                                             #blowout
       element_id sentence_id word_count profanity_count profanity
    1:          1           1          7               1 0.1428571
    2:          1           2          8               0 0.0000000
    3:          1           3          4               0 0.0000000
    4:          2           1         15               0 0.0000000
    5:          2           2          5               0 0.0000000
   ---                                                            
21837:      11785           1         15               0 0.0000000
21838:      11785           2          3               0 0.0000000
21839:      11786           1         15               0 0.0000000
21840:      11786           2         10               0 0.0000000
21841:      11786           3          1               0 0.0000000
[1] 267
 [1] "Jerry Rice weighs in on #deflategate and Super Bowl XLIX. http://t.co/Ssh5aH4UU8"                                                             
 [2] "But eh, que sera sera."                                                                                                                       
 [3] "RT @SportsTalkFLA: Listen to @MikeTuck1080 &amp; Jerry O'Neill talk #NFL #SuperBowlXLIX #DeflateGate #Colts with @NateDunlevy at 530pm."      
 [4] "And used an illegal substance to rehab from an injury?"                                                                                       
 [5] "Listen to @MikeTuck1080 &amp; Jerry O'Neill talk #NFL #SuperBowlXLIX #DeflateGate #Colts w @NateDunlevy at 530pm."                            
 [6] "\"Four Ring Circus\": Jerry ballgame weighs in on #Ballghazi and predicts #SuperBowlXLIX. #DeflateGate #Patriots http://t.co/2LWxjDmnUl"      
 [7] "RT @TMZ: Terrell Owens: Tom Brady KNEW those balls were illegal!"                                                                             
 [8] "@alwise8 @nochillsport @russwest__ fuck #DeflateGate tom Brady Is a nigga it's not the first time a QB changed the feel off the ball"         
 [9] "Yeah illegal but not huge deal."                                                                                                              
[10] "Ray at 8pm and Jerry at 9!"                                                                                                                   
[11] "Ray at 8pm and Jerry at 9!"                                                                                                                   
[12] "Ray at 8pm and Jerry at 9!"                                                                                                                   
[13] "Dedicated to the #DeflateGate Tom Brady &amp; the Ingrates '15 Super Bowl scandal jerry lee lewis great balls of fire: http://t.co/DIdvtQPU3s"
[14] "RT @TEVO_SPRITE: Niggas still talking about the #DeflateGate stfu _/__/_"                                                                     
[15] "Niggas still talking about the #DeflateGate stfu _/__/_"                                                                                      
[16] "Dallas Cowboys Owner, Jerry Jones having his balls deflated before the game."                                                                 
[17] "Yes Is it illegal?"                                                                                                                           
[18] "YANK THE TITLE"                                                                                                                               
[19] "RT @SteveO_H_I_O: There hasn't been this much talk about balls being inappropriately handled since the Jerry Sandusky scandal."               
[20] "Wk8: they lucky Wk12: Brady still gay tho Playoffs1: ILLEGAL FORMATION?"                                                                      
[21] "There hasn't been this much talk about balls being inappropriately handled since the Jerry Sandusky scandal."                                 
[22] "RT @bigSPEELz: Breaking out my illegal 12.5\" width goalie pads tonight in honor of #DeflateGate."                                            
[23] "I wouldn't trust any man that wears a Beanie with a Pom top."                                                                                 
[24] "#DeflateGate\"Storm grows over Patriots' alleged use of illegal balls\" I wish folks got this outraged over violations of the #Constitution"  
[25] "RT @RichardFitchNYC: #Deflategate That's Talk Radio gold, Jerry!"                                                                             
[26] "RT @Oveurungerdunn: #DeflateGate Why would u Yanks have a game with different balls for different teams?"                                     
[27] "#DeflateGate Why would u Yanks have a game with different balls for different teams?"                                                         
[28] "RT @mwgfla: Reporters are checking the name Jerry Gallo, and a possible link to #DeflateGate."                                                
[29] "No wait, it's Callo, Jerry Callo."                                                                                                            
[30] "#DeflateGate #harbaughdiapers The Pats used illegal formations!"                                                                              
[31] "There balls were illegal."                                                                                                                    
attr(,"class")
[1] "get_sentences"           "get_sentences_character"
[3] "character"              
    element_id sentence_id profanity
 1:          1           1     jerry
 2:          2           1       que
 3:          3           1     jerry
 4:          4           1   illegal
 5:          5           1     jerry
 6:          6           1     jerry
 7:          7           1   illegal
 8:          8           1     nigga
 9:          9           1   illegal
10:         10           1     jerry
11:         11           1     jerry
12:         12           1     jerry
13:         13           1     jerry
14:         14           1    niggas
15:         15           1    niggas
16:         16           1     jerry
17:         17           1   illegal
18:         18           1      yank
19:         19           1     jerry
20:         20           1   illegal
21:         21           1     jerry
22:         22           1   illegal
23:         23           1       pom
24:         24           1   illegal
25:         25           1     jerry
26:         26           1     yanks
27:         27           1     yanks
28:         28           1     jerry
29:         29           1     jerry
30:         30           1   illegal
31:         31           1   illegal
    element_id sentence_id profanity
[1] "@alwise8 @nochillsport @russwest__ fuck #DeflateGate tom Brady Is a nigga it's not the first time a QB changed the feel off the ball"
[2] "RT @TEVO_SPRITE: Niggas still talking about the #DeflateGate stfu _/__/_"                                                            
[3] "Niggas still talking about the #DeflateGate stfu _/__/_"                                                                             
[4] "YANK THE TITLE"                                                                                                                      
[5] "I wouldn't trust any man that wears a Beanie with a Pom top."                                                                        
[6] "RT @Oveurungerdunn: #DeflateGate Why would u Yanks have a game with different balls for different teams?"                            
[7] "#DeflateGate Why would u Yanks have a game with different balls for different teams?"                                                
attr(,"class")
[1] "get_sentences"           "get_sentences_character"
[3] "character"              

sentimentr documentation built on Oct. 12, 2021, 9:06 a.m.