explore: Launch Shiny app for exploration of text collection
In corporaexplorer: A 'Shiny' App for Exploration of Text Collections

explore

R Documentation

Launch Shiny app for exploration of text collection

Description

Launch Shiny app for exploration of text collection. Interrupt R to stop the application (usually by pressing Ctrl+C or Esc).

explore() explores a 'corporaexplorerobject' created with the prepare_data() function. App settings optionally specified in the arguments to explore().

explore0() is a convenience function to directly explore a data frame or character vector without first creating a corporaexplorerobject using prepare_data(), instead creating one on the fly as the app launches. Functionally equivalent to explore(prepare_data(dataset, use_matrix = FALSE)).

Usage

explore(
  corpus_object,
  search_options = list(),
  ui_options = list(),
  search_input = list(),
  plot_options = list(),
  ...
)

explore0(
  dataset,
  arguments_prepare_data = list(use_matrix = FALSE),
  arguments_explore = list()
)

Arguments

`corpus_object`	A corporaexplorerobject created by `prepare_data`.
`search_options`	List. Specify how search operations in the app are carried out. Available options: `use_matrix` Logical. If the corporaexplorerobject contains a document term matrix, should it be used for searches? (See `prepare_data`.) Defaults to `TRUE`. `regex_engine` Character. Specify regular expression engine to be used (defaults to `"default"`). Available options: "default": use the `re2` package (https://github.com/girishji/re2) for simple searches and the `stringr` package (https://github.com/tidyverse/stringr for complex regexes (i.e. when special regex characters are used). "stringr": use `stringr` for all searches. "re2": use `re2` for all searches. `optional_info` Logical. If `TRUE`, information about search method (regex engine and whether the search was conducted in the document term matrix or in the full text documents). `allow_unreasonable_patterns` Logical. If `FALSE`, the default, the app will not allow patterns that will result in an enormous amount of hits or will lead to a very slow search. (Examples of such patterns will include '`.`' and '`\b`'.)
`ui_options`	List. Specify custom app settings (see example below). Currently available: `font_size`. Character string specifying font size in document view, e.g. `"10px"`
`search_input`	List. Gives the opportunity to pre-populate the following sidebar fields (see example below): `search_terms`: The 'Term(s) to chart and highlight' field. Character vector with maximum length 5. `highlight_terms`: The 'Additional terms for text highlighting' field. Character vector. `filter_terms`: The 'Filter corpus?' field. Character vector. `case_sensitivity`: Should the 'Case sensitive search' box be checked? Logical.
`plot_options`	List. Specify custom plot settings (see example below). Currently available: `max_docs_in_wall_view`. Integer specifying the maximum number of documents to be rendered in the 'document wall' view. Default value is 12000. `plot_size_factor`. Numeric. Tweaks the corpus map plot's height. Value > 1 increases height, value < 1 decreases height. Ignored if value <= 0. `documents_per_row_factor`. Numeric. Tweaks the number of documents included in each row in 'document wall' view. Value > 1 increases number of documents, value < 1 decreases number of documents. Ignored if value <= 0. `document_tiles`. Integer specifying the number of tiles used in the tile chart representing occurences of terms in document. Ignored if value < 1 or if value > 50. `colours`. Character vector of length 1 to 6. Specify the order of the colours used to represent search (and highlight) terms in plots and documents. The default order and available colours are defined by the character vector `c("red", "blue", "green", "purple", "orange", "gray")`. Passing e.g. `plot_options = list(colours = c("gray", "green"))` will change that order to `c("gray", "green", "red", "blue", "purple", "orange")`. Arguments with duplicated colours or with colours not present in the default character vector will be ignored. `tile_length`. Either `"scaled"` or `"uniform"`. With `"scaled"`, the default, the length of the tiles in document wall view and day corpus view will vary according to length of document (see the `tile_length_range` argument in `prepare_data()`). If `"uniform"`, all tiles will be of equal length.
`...`	Other arguments passed to `runApp` in the Shiny package.
`dataset`	Data frame or character vector as specified in `prepare_data()`
`arguments_prepare_data`	List. Arguments to be passed to `prepare_data()` in order to override this function's default argument values.
`arguments_explore`	List. Arguments to be passed to `explore()` in order to override this function's default argument values.

Details

For explore0(): by default, no document term matrix will be generated, meaning that the data will be prepared for exploration faster than by using the default settings in prepare_data(), but also that searches in the app are likely to be slower.

Value

Launches a Shiny app.

Examples

# Constructing test data frame:
dates <- as.Date(paste(2011:2020, 1:10, 21:30, sep = "-"))
texts <- paste0(
  "This is a document about ", month.name[1:10], ". ",
  "This is not a document about ", rev(month.name[1:10]), "."
)
titles <- paste("Text", 1:10)
test_df <- tibble::tibble(Date = dates, Text = texts, Title = titles)

# Converting to corporaexplorerobject:
corpus <- prepare_data(test_df, corpus_name = "Test corpus")

if(interactive()){

# Running exploration app:
explore(corpus)
explore(corpus,
        search_options = list(optional_info = TRUE),
        ui_options = list(font_size = "10px"),
        search_input = list(search_terms = c("Tottenham", "Spurs")),
        plot_options = list(max_docs_in_wall_view = 12001,
                                        colours = c("gray", "green")))

# Running app to extract documents:
run_document_extractor(corpus)
}
if (interactive()) {

explore0(rep(sample(LETTERS), 10))

explore0(rep(sample(LETTERS), 10),
  arguments_explore = list(search_input = list(search_terms = "Z"))
)

}

corporaexplorer documentation built on Sept. 11, 2024, 7:21 p.m.