idiosync_response_words: Idiosyncratic Response Words
In nlanderson9/languagePredictR: Predict Outcomes from Natural Language

Description Usage Arguments Details Value See Also Examples

This function identifies response words that are idiosyncratic (i.e. appear multiple times in a single response, and not in any other responses). It can also be used to remove these words.

idiosync_response_words(
  inputDataframe,
  mode,
  textColumnName,
  participantColumnName
)

`inputDataframe`	A dataframe containing a column with text data (character strings)
`mode`	This defines the mode of operation. Options include "output", "remove", or "both". See Details below.
`textColumnName`	A string consisting of the name of the column in `inputDataframe` which contains text data
`participantColumnName`	(Optional argument) A string consisting of the name of the column in `inputDataframe` which contains participant IDs

This function has three modes: In the "output" mode, a dataframe is produced with three columns: the response with idiosyncratic words, the words, and how frequently they appear in that response. If a participantColumnName is provided, a fourth column with participant IDs is included. In the "remove" mode, a character string (or vector of character strings) is produced, where all of the idiosyncratic words are removed. In the "both" mode, both of the above results will be produced (i.e. a list containing a dataframe of idiosyncratic words, as well as the text with those words removed)

A dataframe (mode="output"), a character string or vector of character strings (mode="remove"), or a two-object list containing both results (mode="both")

idiosync_participant_words

myStrings = c("I like going to the park. The park is one of my favorite places to visit.",
              "Today is really rainy, but I'm a fan of this kind of weather to be honest.",
              "Yesterday, a bright red car with shiny red wheels drove past the house.")
mydataframe = data.frame(text=myStrings, stringsAsFactors = FALSE)
idiosync_output = idiosync_response_words(mydataframe, textColumnName = "text", mode = "output")
idiosync_output
# response_number     feature       frequency
# 1                   park          2
# 3                   red           2

idiosync_removed = idiosync_response_words(mydataframe, textColumnName = "text", mode = "remove")
idiosync_removed
# "I like going to the. The is one of my favorite places to visit."
# "Today is really rainy, but I'm a fan of this kind of weather to be honest."
# "Yesterday, a bright car with shiny wheels drove past the house."