Takes a text variable from a dataframe and runs a number of standard text preprocessing procedures on it, like removing html tags, removing stopwords, converting to lowercase. Preprocessing techniques and tokenization are applied in an interactive yes/no console session with the user. A list of the procedures used are saved in a local .txt file in directory specified by the user.
1 2 3 4 5 6 7 8 |
@param textdata a dataframe containing a text variable @param textvar the name of the column in the first param containing text @param type right now there is only one type called "docs" @param language user specified language, determines what tm::stopword dictionary is used @param outdir the directory that the user wishes to have the output .txt file saved in @param outname the name of the transformations .txt summary file, defaults to transformations.txt but can be renamed
@return the dataframe with a cleaned and/or tokenized text variable
@export
@import textclean @import dplyr @import tidytext @import textstem @import SnowballC @import tibble @import tm @import magrittr @import crayon
@examples ## Not run: results <- textprep(df, "text", language = "english", outdir = "~/Desktop/files") ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.