cas_check_corpus: Checks if given corpus exists, and, optionally updates it
In giocomai/castarter: Content Analysis Starter Toolkit

cas_check_corpus

R Documentation

Checks if given corpus exists, and, optionally updates it

Description

Checks if given corpus exists, and, optionally updates it

Usage

cas_check_corpus(
  ...,
  update = FALSE,
  keep_only_latest = FALSE,
  path = NULL,
  file_format = "parquet",
  partition = NULL,
  token = "full_text",
  corpus_folder = "corpus"
)

Arguments

`...`	Passed to `cas_get_db_file()`.
`update`	Logical, defaults to FALSE. If set to TRUE, it checks if the local database has contents with a higher content id than is currently available in previously exported corpus, if any. If so, it writes a new, updated corpus.
`keep_only_latest`	Logical, defaults to FALSE. If set to TRUE, it deletes previous, older, corpora of the same type.
`path`	Defaults to `NULL`. If `NULL`, path is set to the `project/website/export/dataset/file_format` folder.
`file_format`	Defaults to "parquet". Currently, other options are not implemented.
`partition`	Defaults to `NULL`. If `NULL`, the parquet file is not partitioned. "year" is a common alternative: if set to "year", the parquet file is partitioned by year. If a `year` column does not exist, it is created based on the assumption that a `date` column exists and it is (or can be coerced to) a vector of class `Date`.
`token`	Defaults to "full_text", which does not tokenise the text column. If different from `full_text`, it is passed to `tidytext::unnest_tokens()` (see its help for details). Accepted values include "words", "sentences", and "paragraphs".