Man pages for avkoehl/textprocessingDSI
Clean an arbitrarily large corpus for topic modelling over many cores

clean_corpusClean Corpus
clean_fileClean File
doccount_corpusDoccount of Corpus
dos2unixDos to Unix line endings
filter_corpusFilter Corpus
filter_fileFilter File
get_abundantGet List of Abundant Terms
get_bottom_termsGet List of Least Frequent Terms
get_sparseGet List of Sparse Terms
get_top_termsGet List of Most Frequent Terms
lemma_corpusClean Corpus
lemma_fileLemma File
pipelineText Processing Pipeline
rcpp_doccountRcpp Doccount
rcpp_filterRcpp Filter
rcpp_joinRcpp Join
rcpp_splitRcpp Split
rcpp_summaryRcpp Summary
summary_corpusSummary of Corpus
summary_fileSummary of File Word Counts
textprocessingDSI-packageEfficiently clean a corpus in parallel
avkoehl/textprocessingDSI documentation built on June 5, 2019, 7:41 p.m.