extract_profanity_terms: Extract Profanity Words

Description Usage Arguments Value Examples

View source: R/extract_profanity_terms.R

Description

Extract the profanity words from a text.

Usage

1
2
3
4
5
extract_profanity_terms(
  text.var,
  profanity_list = unique(tolower(lexicon::profanity_alvarez)),
  ...
)

Arguments

text.var

The text variable. Can be a get_sentences object or a raw character vector though get_sentences is preferred as it avoids the repeated cost of doing sentence boundary disambiguation every time profanity is run.

profanity_list

A atomic character vector of profane words. The lexicon package has lists that can be used, including:

  • lexicon::profanity_alvarez

  • lexicon::profanity_arr_bad

  • lexicon::profanity_banned

  • lexicon::profanity_zac_anger

...

Ignored.

Value

Returns a data.table with a columns of profane terms.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
## Not run: 
bw <- sample(lexicon::profanity_alvarez, 4)
mytext <- c(
   sprintf('do you %s like this %s?  It is %s. But I hate really bad dogs', bw[1], bw[2], bw[3]),
   'I am the best friend.',
   NA,
   sprintf('I %s hate this %s', bw[3], bw[4]),
   "Do you really like it?  I'm not happy"
)


x <- get_sentences(mytext)
profanity(x)

prof_words <- extract_profanity_terms(x)
prof_words
prof_words$sentence
prof_words$neutral
prof_words$profanity
data.table::as.data.table(prof_words)

attributes(extract_profanity_terms(x))$counts
attributes(extract_profanity_terms(x))$elements


brady <- get_sentences(crowdflower_deflategate)
brady_swears <- extract_profanity_terms(brady)

attributes(extract_profanity_terms(brady))$counts
attributes(extract_profanity_terms(brady))$elements

## End(Not run)

Example output

   element_id sentence_id word_count profanity_count profanity
1:          1           1          6               2 0.3333333
2:          1           2          3               1 0.3333333
3:          1           3          6               0 0.0000000
4:          2           1          5               0 0.0000000
5:          3           1         NA               0 0.0000000
6:          4           1          5               2 0.4000000
7:          5           1          5               0 0.0000000
8:          5           2          3               0 0.0000000
   element_id sentence_id    profanity
1:          1           1   sh1t,sh1tz
2:          1           2          feg
3:          1           3             
4:          2           1             
5:          3           1             
6:          4           1 feg,schaffer
7:          5           1             
8:          5           2             
[1] "do you sh1t like this sh1tz?" "It is feg."                  
[3] "But I hate really bad dogs"   "I am the best friend."       
[5] NA                             "I feg hate this schaffer"    
[7] "Do you really like it?"       "I'm not happy"               
[[1]]
[1] "do"   "like" "this" "you" 

[[2]]
[1] "is" "it"

[[3]]
[1] "bad"    "but"    "dogs"   "hate"   "i"      "really"

[[4]]
[1] "am"     "best"   "friend" "i"      "the"   

[[5]]
[1] NA

[[6]]
[1] "hate" "i"    "this"

[[7]]
[1] "do"     "it"     "like"   "really" "you"   

[[8]]
[1] "happy" "i'm"   "not"  

[[1]]
[1] "sh1t"  "sh1tz"

[[2]]
[1] "feg"

[[3]]
character(0)

[[4]]
character(0)

[[5]]
character(0)

[[6]]
[1] "feg"      "schaffer"

[[7]]
character(0)

[[8]]
character(0)

   element_id sentence_id                    neutral    profanity
1:          1           1           do,like,this,you   sh1t,sh1tz
2:          1           2                      is,it          feg
3:          1           3 bad,but,dogs,hate,i,really             
4:          2           1       am,best,friend,i,the             
5:          3           1                         NA             
6:          4           1                hate,i,this feg,schaffer
7:          5           1      do,it,like,really,you             
8:          5           2              happy,i'm,not             
                       sentence
1: do you sh1t like this sh1tz?
2:                   It is feg.
3:   But I hate really bad dogs
4:        I am the best friend.
5:                         <NA>
6:     I feg hate this schaffer
7:       Do you really like it?
8:                I'm not happy
       words profanity n
 1:      feg         1 2
 2: schaffer         1 1
 3:     sh1t         1 1
 4:    sh1tz         1 1
 5:        i         0 3
 6:       do         0 2
 7:     hate         0 2
 8:       it         0 2
 9:     like         0 2
10:   really         0 2
11:     this         0 2
12:      you         0 2
13:     <NA>         0 1
14:       am         0 1
15:      bad         0 1
16:     best         0 1
17:      but         0 1
18:     dogs         0 1
19:   friend         0 1
20:    happy         0 1
21:      i'm         0 1
22:       is         0 1
23:      not         0 1
24:      the         0 1
       words profanity n
    element_id sentence_id    words profanity
 1:          3           1     <NA>         0
 2:          2           1       am         0
 3:          1           3      bad         0
 4:          2           1     best         0
 5:          1           3      but         0
 6:          1           1       do         0
 7:          5           1       do         0
 8:          1           3     dogs         0
 9:          1           2      feg         1
10:          4           1      feg         1
11:          2           1   friend         0
12:          5           2    happy         0
13:          1           3     hate         0
14:          4           1     hate         0
15:          1           3        i         0
16:          2           1        i         0
17:          4           1        i         0
18:          5           2      i'm         0
19:          1           2       is         0
20:          1           2       it         0
21:          5           1       it         0
22:          1           1     like         0
23:          5           1     like         0
24:          5           2      not         0
25:          1           3   really         0
26:          5           1   really         0
27:          4           1 schaffer         1
28:          1           1     sh1t         1
29:          1           1    sh1tz         1
30:          2           1      the         0
31:          1           1     this         0
32:          4           1     this         0
33:          1           1      you         0
34:          5           1      you         0
    element_id sentence_id    words profanity
         words profanity  n
    1:    shit         1 51
    2:    fuck         1 45
    3:     ass         1 39
    4: fucking         1 23
    5:    crap         1 16
   ---                     
26467:      ~2         0  1
26468:  ~jesse         0  1
26469:   ~reed         0  1
26470:     ~tb         0  1
26471:    ~tom         0  1
        element_id sentence_id  words profanity
     1:        405           1                0
     2:        426           1                0
     3:        663           1                0
     4:        672           1                0
     5:        760           1                0
    ---                                        
187038:      17612           1     ~2         0
187039:      11943           1 ~jesse         0
187040:       6400           1  ~reed         0
187041:       7815           1    ~tb         0
187042:      12905           1   ~tom         0

sentimentr documentation built on Oct. 12, 2021, 9:06 a.m.