knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  warning = FALSE, message = FALSE
)

Often text data -- especially data from social media websites like Twitter, Facebook or Instagram -- can contain emojis which are small images of facial expressions, places, foods, animals, etc. Many text mining processes are unable to process these non-alphanumeric characters, and so, in a preprocessing step, they are dropped from text data. However, removing emojis from text also removes important insight into the meaning of the text data. For example, "I am going to Advanced R class" followed by a smiley face verse a frowning face takes on different meaning.

The emoji2text package allows users to replace emojis with descriptive phrases as an alternative to dropping these characters from text.

The Function

The primary function in this package is emoji_to_text.

   emoji_to_text(character_vector, accents = FALSE, delete = FALSE)

The function dectects and replaces emojis, as well as accent characters, within text data. Given a vector of character strings, the function returns a vector of character strings where each emoji and/or accent unicode is replaced with a corresponding descriptive phrase (e.g., "smiling face with heart-eyes").

The function works by iterating through Emojis, an emoji reference data frame. Using regular expressions, if a byte sequence associated with an emoji is found within the text, the byte sequence is replaced with the corresponding english phrase.

If desired, the option accents = TRUE can be specified to allow for characters with accents to be replaced with a corresponding ASCII character. This process works similariy, by iterating through Accents, an accent reference data frame.

If delete = TRUE, bytes that are not matched to an emoji or an accent are deleted.

Reference Data Frames

Example: Sentiment Analysis

To demonstrate a potential use for emoji_to_text, we will perform sentiment analysis on text after replacing emojis with english phrases.

Suppose we have the following data frame:

library(stringr)
library(dplyr)

emoji_text <- data.frame(text = c("I’m so hungry 😩 and my stomach hurts!",
                                  "I am super excited to go on vacation with my family πŸŒΈπŸŒŠπŸ„πŸ½",
                                  "My mom is making me eat my vegetables. πŸ₯¦πŸ€’😑😭 I wish I could eat ice cream all day! 🍦",
                                  "I have to go to bed now! 😴 Good night!πŸ˜˜πŸ˜πŸ’•",
                                  "I have to go to bed now! πŸ˜” Good night!😠😠😠"),
                         stringsAsFactors = FALSE)

Note:

When printed, the emojis are displayed as unicode which is not in an interpretable format.

# examine text column
head(emoji_text$text)

Before performing our sentiment analysis, let's first convert the emojis to english phrases to extract additional meaning.

library(emoji2text)
emoji_text$clean_text <- emoji_to_text(emoji_text$text)

To perform sentiment analysis, we will be using the package syzhet and the function get_sentiment. For this example, the method used to extract sentiment is the afinn method however the syzhet package includes multiple different methods for getting the sentiment of a phrase.

library(syuzhet)
emoji_text$sentiment <- get_sentiment(emoji_text$clean_text, method = "afinn")

Now let's look at our resulting dataframe.

knitr::include_graphics("sentiment_table.png")

Notice that by replacing the emojis with their corresponding english phrases, the sentiment scores for the fourth and fifth rows are what we expected.



Rkabacoff/emoji2text documentation built on May 3, 2019, 5:23 p.m.