prepare_data | R Documentation |
Convert data frame or character vector to a ‘corporaexplorerobject’ for subsequent exploration.
prepare_data(dataset, ...) ## S3 method for class 'data.frame' prepare_data( dataset, date_based_corpus = TRUE, grouping_variable = NULL, within_group_identifier = "Seq", columns_doc_info = c("Date", "Title", "URL"), corpus_name = NULL, use_matrix = TRUE, matrix_without_punctuation = TRUE, tile_length_range = c(1, 10), columns_for_ui_checkboxes = NULL, ... ) ## S3 method for class 'character' prepare_data( dataset, corpus_name = NULL, use_matrix = TRUE, matrix_without_punctuation = TRUE, ... )
dataset |
Object to convert to corporaexplorerobject:
|
... |
Other arguments to be passed to |
date_based_corpus |
Logical. Set to |
grouping_variable |
Character string.
If |
within_group_identifier |
Character string indicating column name in |
columns_doc_info |
Character vector. The columns from |
corpus_name |
Character string with name of corpus. |
use_matrix |
Logical. Should the function create a document term matrix
for fast searching? If |
matrix_without_punctuation |
Should punctuation and digits be stripped
from the text before constructing the document term matrix? If
If |
tile_length_range |
Numeric vector of length two.
Fine-tune the tile lengths in document wall
and day corpus view. Tile length is calculated by
|
columns_for_ui_checkboxes |
Character. Character or factor column(s) in dataset.
Include sets of checkboxes in the app sidebar for
convenient filtering of corpus.
Typical useful for columns with a small set of unique
(and short) values.
Checkboxes will be arranged by |
For data.frame: Each row in dataset
is treated as a base differentiating unit in the corpus,
typically chapters in books, or a single document in document collections.
The following column names are reserved and cannot be used in dataset
:
"ID",
"Text_original_case",
"Tile_length",
"Year",
"Seq",
"Weekday_n",
"Day_without_docs",
"Invisible_fake_date",
"Tile_length".
A character vector will be converted to a simple corporaexplorerobject with no metadata.
A corporaexplorer
object to be passed as argument to
explore
and
run_document_extractor
.
## From data.frame # Constructing test data frame: dates <- as.Date(paste(2011:2020, 1:10, 21:30, sep = "-")) texts <- paste0( "This is a document about ", month.name[1:10], ". ", "This is not a document about ", rev(month.name[1:10]), "." ) titles <- paste("Text", 1:10) test_df <- tibble::tibble(Date = dates, Text = texts, Title = titles) # Converting to corporaexplorerobject: corpus <- prepare_data(test_df, corpus_name = "Test corpus") if(interactive()){ # Running exploration app: explore(corpus) # Running app to extract documents: run_document_extractor(corpus) } ## From character vector alphabet_corpus <- prepare_data(LETTERS) if(interactive()){ # Running exploration app: explore(alphabet_corpus) }
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.