loadFiles: Load Corpus Files

Description Usage Arguments Details

View source: R/topic_modelling.R

Description

Used to load the files into memory. Assume the format of the new crawler, where each year of mailing list is inside a folder, and months inside sub-folders. See rawToLDA to see it's usage.

Usage

1
2
loadFiles(parsed.corpus.folder.path,
  corpus_setup = "/**/*.reply.title_body.txt")

Arguments

raw.corpus.folder.path

The path to the corpus folder (e.g. 2012.parsed)

Returns a folder used by rawToLDA.

Details

TODO: Parameterize the file extension (currently assumes reply.body.txt)


sailuh/topicflowr documentation built on May 27, 2019, 8:46 a.m.