knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(textdata)
This package provides infrastructure to make text datasets available within R, even when they are too large to store within an R package or are licensed in such a way that prevents them from being included in OSS-licensed packages.
Do you want to add a new dataset to the textdata package?
prefix_*.R
in the R/
folder, where *
is the name of the dataset. Supported prefixes includedataset_
lexicon_
download_*()
, process_*()
and dataset_*()
.download_*()
function should take 1 argument named folder_path
. It has 2 tasks, first it should check if the file is already downloaded. If it is already downloaded it should return invisible()
. If the file isn't at the path it should download the file to said path.process_*()
function should take 2 arguments, folder_path
and name_path
. folder_path
denotes the the path to the file returned by download_*
and name_path
is the path to where the polished data should live. Main point of process_*()
is to turn the downloaded file into a .rds file containing a tidy tibble.dataset_*()
function should wrap the load_dataset()
.process_*()
function to the named list process_functions
in the file process_functions.R.download_*()
function to the named list download_functions
in the file download_functions.R.print_info
list in the info.R file.dataset_*.R
to the @include tags in download_functions.R
.README.Rmd
._pkgdown.yml
.NEWS.md file
.What are the guidelines for adding datasets?
word
instead of words
for column names.For datasets that comes with a testing and training dataset. Let the user pick which one to retrieve with a split
argument similar to how dataset_ag_news()
is doing.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.