cas_write_corpus: Export the textual dataset for the current website
In giocomai/castarter: Content Analysis Starter Toolkit

cas_write_corpus

R Documentation

Export the textual dataset for the current website

Description

Export the textual dataset for the current website

Usage

cas_write_corpus(
  corpus = NULL,
  to_lower = FALSE,
  drop_na = TRUE,
  drop_empty = TRUE,
  date = date,
  text = text,
  tif_compliant = FALSE,
  file_format = "parquet",
  partition = NULL,
  token = "full_text",
  corpus_folder = "corpus",
  path = NULL,
  db_connection = NULL,
  db_folder = NULL,
  ...
)

Arguments

`corpus`	Defaults to `NULL`. If `NULL`, retrieves corpus from the current website with `cas_read_db_contents_data()`. If given, it is expected to be a corresponding data frame.
`to_lower`	Defaults to `FALSE`. Whether to convert tokens to lower case. Passed to `tidytext` if token is not `full_text`.
`drop_na`	Defaults to `TRUE`. If `TRUE`, items that have `NA` in their `text` or `date` columns are dropped. This is often useful, as in many cases these may have other issues and/or cause inconsistencies in further analyses.
`drop_empty`	Defaults to `TRUE`. If `TRUE`, items that have empty elements ("") in their `text` or `date` columns are dropped. This is often useful, as in many cases these may have other issues and/or cause inconsistencies in further analyses.
`date`	Unquoted date column, defaults to `date`.
`text`	Unquoted text column, defaults to `text`. If `tif_compliant` is set to `TRUE`, it will be renamed to "text" even if originally it had a different name.
`tif_compliant`	Defaults to `FALSE`. If `TRUE`, it ensures that the first column is a character vector named "doc_id" and that the second column is a character vector named "text". See https://docs.ropensci.org/tif/ for details.
`file_format`	Defaults to "parquet". Currently, other options are not implemented.
`partition`	Defaults to `NULL`. If `NULL`, the parquet file is not partitioned. "year" is a common alternative: if set to "year", the parquet file is partitioned by year. If a `year` column does not exist, it is created based on the assumption that a `date` column exists and it is (or can be coerced to) a vector of class `Date`.
`token`	Defaults to "full_text", which does not tokenise the text column. If different from `full_text`, it is passed to `tidytext::unnest_tokens()` (see its help for details). Accepted values include "words", "sentences", and "paragraphs".
`path`	Defaults to `NULL`. If `NULL`, path is set to the `project/website/export/dataset/file_format` folder.
`db_connection`	Defaults to NULL. If NULL, uses local SQLite database. If given, must be a connection object or a list with relevant connection settings (see example).
`...`	Passed to `cas_get_db_file()`.

Value

Invisibly returns the path to the corpus.

Examples

## Not run: 
cas_write_corpus(cas_read_db_contents_data(), partition = "year")

## End(Not run)

giocomai/castarter documentation built on June 12, 2025, 8:49 p.m.

giocomai/castarter index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

giocomai/castarter
Content Analysis Starter Toolkit

cas_write_corpus: Export the textual dataset for the current website
In giocomai/castarter: Content Analysis Starter Toolkit

Export the textual dataset for the current website

Description

Usage

Arguments

Value

Examples

Related to cas_write_corpus in giocomai/castarter...

R Package Documentation

Browse R Packages

We want your feedback!

giocomai/castarter Content Analysis Starter Toolkit

cas_write_corpus: Export the textual dataset for the current website In giocomai/castarter: Content Analysis Starter Toolkit

Export the textual dataset for the current website

Description

Usage

Arguments

Value

Examples

Related to cas_write_corpus in giocomai/castarter...

R Package Documentation

Browse R Packages

We want your feedback!

giocomai/castarter
Content Analysis Starter Toolkit

cas_write_corpus: Export the textual dataset for the current website
In giocomai/castarter: Content Analysis Starter Toolkit