textprep: clean and/or tokenize your text data in a single function

Description Usage Details

View source: R/textprep.R

Description

Takes a text variable from a dataframe and runs a number of standard text preprocessing procedures on it, like removing html tags, removing stopwords, converting to lowercase. Preprocessing techniques and tokenization are applied in an interactive yes/no console session with the user. A list of the procedures used are saved in a local .txt file in directory specified by the user.

Usage

1
2
3
4
5
6
7
8
textprep(
  textdata,
  textvar,
  type = "docs",
  language = "english",
  outdir = NA,
  outname = "/transformations.txt"
)

Details

@param textdata a dataframe containing a text variable @param textvar the name of the column in the first param containing text @param type right now there is only one type called "docs" @param language user specified language, determines what tm::stopword dictionary is used @param outdir the directory that the user wishes to have the output .txt file saved in @param outname the name of the transformations .txt summary file, defaults to transformations.txt but can be renamed

@return the dataframe with a cleaned and/or tokenized text variable

@export

@import textclean @import dplyr @import tidytext @import textstem @import SnowballC @import tibble @import tm @import magrittr @import crayon

@examples ## Not run: results <- textprep(df, "text", language = "english", outdir = "~/Desktop/files") ## End(Not run)


alexlusco/compositr documentation built on Jan. 19, 2021, 8:33 p.m.