summary_corpus: Summary of Corpus

Description Usage Arguments Value Examples

Description

A function that calculates word frequency and document frequency for all the words in the corpus. The output can then be analyzed to remove outlier words, or stop words. Handles each file in parallel over the number of cores specified using parlapply. Runs summary_file function on each of the files in the ipath.

Usage

1
summary_corpus(ipath, ncores, flag = 0)

Arguments

ipath

A string specifying the path to the input files.

ncores

A number specifying the number of cores to use.

flag

**optional** A number specifying if documents are delimited by newline (set to 0) or each document is in a different text file.

Value

A dataframe object that has merged the dataframes for each file. Has term,freq,doccount for each term.

Examples

1
2
3
4
## Not run: 
summary_corpus("/path/to/corpus/", 0)

## End(Not run)

avkoehl/textprocessingDSI documentation built on June 5, 2019, 7:41 p.m.