bow_from_doclist: Transform the documents stored in a dataframe in a bag of...

Description Usage Arguments Value

Description

Transform the documents stored in a dataframe in a bag of words.

Usage

1
2
bow_from_doclist(x, language = "english", rm_words = NULL,
  min_nchar = 3, match_term = NULL)

Arguments

x

character. Content to transform in a bag of words.

language

character string. Language for the stopwords to remove.

rm_words

character vector. List of words (or n-tokens) to remove.

min_nchar

integer. Number of characters below which the word is removed.

match_term

dataframe. Table with two variables: "word" for the string in the content and "term" for its categorization. Match words to force desired completions.

Value

A tibble with the id of the document, words, stems, lemmas, terms, counts, and proportions.


NicolasJBM/lexana documentation built on July 3, 2019, 10 a.m.