R/data.R

#' Corpus data
#'
#' @description The corpus is a random subset of 25,000 sentences from one of the Indonesian Leipzig Corpora files, i.e., the \code{"ind_news_2008_300K-sentences.txt"}. This corpus file originally contains 300,000 sentences of Indonesian online newspapers.
#' @format A character vector of 25,000 elements of sentences.
#' @source \url{http://wortschatz.uni-leipzig.de/en/download}
"my_leipzig_sample"


#' Corpus data non-Leipzig
#'
#' @description This is an example input data whose line does not correspond to one sentence as in the Leipzig Corpora.
#' @format A character vector of 18 elements that do not all correspond to one sentence.
#' @source from BOLA NEWS.com article entitled "Nostalgia Fergie dan Beckham" posted on 16 February 2010
"bola_corpus_text"
gederajeg/wordpairs documentation built on May 23, 2019, 2:46 p.m.