R/data.R

#' Leipzig corpora demo
#'
#' A random subset of 250 sentences from all unannotated Indonesian Leipzig Corpora files.
"demo_corpus_leipzig"

#' Balinese newspapers texts
#'
#' A random ten corpus files of Balinese newspapers texts. These corpus files are raw plain texts without any annotation and which have not been split by sentences.
"demo_corpus_bali"

#' Indonesian short stories
#'
#' A random ten corpus files of Indonesian short stories retrieved from online blog. These corpus files are raw plain texts without any annotation and which have not been split by sentences.
"demo_corpus_id"

#' Stopwords
#'
#' List of Indonesian stopwords
"stopwords"

#' Unit testing data 1
#'
#' A list containing two mini data frame as output for \code{freqlist_leipzig_all()}. This list is used for unit testing data.
"flist_mini"

#' Unit testing data 2
#'
#' A plain text containing thirty random sentences from the Indonesian Leipzig corpora. This file is stored in the testthat directory and used for unit testing data.
"mini_leipzig"

#' Unit testing data 3
#'
#' A plain text containing thirty random sentences from the Indonesian Leipzig corpora. This file is stored in the testthat directory and used for unit testing data.
"mini_leipzig_1"

#' Unit testing data 4
#'
#' A list containing outputs from \code{colloc_default()}. This list is used for unit testing data.
"obali_colloc_output_test"
gederajeg/corplingr documentation built on Dec. 20, 2021, 9:50 a.m.