R/data.R

#' Leipzig corpora demo
#'
#' A random subset of 250 sentences from all unannotated Indonesian Leipzig Corpora files from the author's collection.
"demo_corpus_leipzig"

#' Filepath of the Indonesian Leipzig Corpora
#'
#' The author's full filepaths to the plain-texts of the Indonesian Leipzig Corpora
"leipzig_corpus_path"

#' Demo data for Distinctive Collexeme/Collocate Analysis
#'
#' The output of \code{colloc_leipzig()} for illustrating \emph{Distinctive Collexeme/Collocate Analysis} on package README page.
"dca_coll"

#' Indonesian stopwords
#'
#' Character vector containing Indonesian stopwords. They are gathered from the following resources:
#' \itemize{
#' \item{https://raw.githubusercontent.com/rilut/python-goose/patch-1/goose/resources/text/stopwords-id.txt (Last access on 3 April 2016)}
#' \item{Appendix D of "Tala F. Z. (2003). A Study of Stemming Effects on Information Retrieval in Bahasa Indonesia. \emph{M.S. thesis, University of Amsterdam}." Retrieved from: https://pdfs.semanticscholar.org/8ed9/c7d54fd3f0b1ce3815b2eca82147b771ca8f.pdf (Last access on 23 September 2018).}
#' }
"stopwords"
gederajeg/collogetr documentation built on April 16, 2020, 11:58 a.m.