View source: R/cas_write_corpus.R
cas_write_corpus | R Documentation |
Export the textual dataset for the current website
cas_write_corpus(
corpus = NULL,
to_lower = FALSE,
drop_na = TRUE,
drop_empty = TRUE,
date = date,
text = text,
tif_compliant = FALSE,
file_format = "parquet",
partition = NULL,
token = "full_text",
corpus_folder = "corpus",
path = NULL,
db_connection = NULL,
db_folder = NULL,
...
)
corpus |
Defaults to NULL. If NULL, retrieves corpus from the current
website with |
to_lower |
Defaults to FALSE. Whether to convert tokens to lowercase.
Passed to |
drop_na |
Defaults to TRUE. If TRUE, items that have NA in their |
drop_empty |
Defaults to TRUE. If TRUE, items that have empty elements
("") in their |
date |
Unquoted date column, defaults to |
text |
Unquoted text column, defaults to |
tif_compliant |
Defaults to FALSE. If TRUE, it ensures that the first column is a character vector named "doc_id" and that the second column is a character vector named "text". See https://docs.ropensci.org/tif/ for details |
file_format |
Defaults to "parquet". Currently, other options are not implemented. |
partition |
Defaults to NULL. If NULL, the parquet file is not
partitioned. "year" is a common alternative: if set to "year", the parquet
file is partitioned by year. If a |
token |
Defaults to "full_text", which does not tokenise the text
column. If different from |
path |
Defaults to NULL. If NULL, path is set to the project/website/export/dataset/file_format folder. |
db_connection |
Defaults to NULL. If NULL, uses local SQLite database. If given, must be a connection object or a list with relevant connection settings (see example). |
... |
Passed to |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.