data/hatespeech/readme.md

Automated Hate Speech Detection

Paper Github

You must create a model which predicts a probability of each type of toxicity for each comment.

pacman::p_load(tidyverse)
hate_dat <- read_csv("labeled_data.csv") %>% 
  rename(id = X1) %>%
  glimpse
## Warning: Missing column names filled in: 'X1' [1]

## Parsed with column specification:
## cols(
##   X1 = col_double(),
##   count = col_double(),
##   hate_speech = col_double(),
##   offensive_language = col_double(),
##   neither = col_double(),
##   class = col_double(),
##   tweet = col_character()
## )

## Observations: 24,783
## Variables: 7
## $ id                 <dbl> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,…
## $ count              <dbl> 3, 3, 3, 3, 6, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, …
## $ hate_speech        <dbl> 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, …
## $ offensive_language <dbl> 0, 3, 3, 2, 6, 2, 3, 3, 3, 2, 3, 3, 2, 3, 2, …
## $ neither            <dbl> 3, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, …
## $ class              <dbl> 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ tweet              <chr> "!!! RT @mayasolovely: As a woman you shouldn…
# write_rds(toxic_dat, path = "toxic_dat.rds")
# save(toxic_dat, file = "hate_dat")


systats/textlearnR documentation built on May 6, 2019, 8:31 p.m.