search_new: Create a new search
In oliverehmer/act: Aligned Corpus Toolkit

search_new

R Documentation

Create a new search

Description

Creates a new search object and runs the search in a corpus object. Only 'x' and 'pattern' are obligatory. The other arguments can be left to their default values.

Usage

search_new(
  x,
  pattern,
  searchMode = c("content", "fulltext", "fulltext.byTime", "fulltext.byTier"),
  searchNormalized = TRUE,
  name = "mysearch",
  resultid.prefix = "result",
  resultid.start = 1,
  filterTranscriptNames = NULL,
  filterTranscriptIncludeRegEx = NULL,
  filterTranscriptExcludeRegEx = NULL,
  filterTierNames = NULL,
  filterTierIncludeRegEx = NULL,
  filterTierExcludeRegEx = NULL,
  filterSectionStartsec = NULL,
  filterSectionEndsec = NULL,
  concordanceMake = TRUE,
  concordanceWidth = NULL,
  cutSpanBeforesec = 0,
  cutSpanAftersec = 0,
  runSearch = TRUE
)

Arguments

`x`	Corpus object; basis in which will be searched.
`pattern`	Character string; search pattern as regular expression.
`searchMode`	Character string; takes the following values: `content`, `fulltext` (=default, includes both full text modes), `fulltext.byTime`, `fulltext.byTier`.
`searchNormalized`	Logical; if `TRUE` function will search in the normalized content, if `FALSE` function will search in the original content.
`name`	Character string; name of the search. Will be used, for example, as name of the sub folder when creating media cuts.
`resultid.prefix`	Character string; search results will be numbered consecutively; This character string will be placed before the consecutive numbers.
`resultid.start`	Integer; search results will be numbered consecutively; This is the start number of the identifiers.
`filterTranscriptNames`	Vector of character strings; names of transcripts to be included.
`filterTranscriptIncludeRegEx`	Character string; as regular expression, limit search to certain transcripts matching the expression.
`filterTranscriptExcludeRegEx`	Character string; as regular expression, exclude certain transcripts matching the expression.
`filterTierNames`	Vector of character strings; names of tiers to be included.
`filterTierIncludeRegEx`	Character string; as regular expression, limit search to certain tiers matching the expression.
`filterTierExcludeRegEx`	Character string; as regular expression, exclude certain tiers matching the expression.
`filterSectionStartsec`	Double; start time of region for search.
`filterSectionEndsec`	Double; end time of region for search.
`concordanceMake`	Logical; if `TRUE` concordance will be added to search results.
`concordanceWidth`	Integer; number of characters to the left and right of the search hit in the concordance , the default is `120`.
`cutSpanBeforesec`	Double; Start the media and transcript cut some seconds before the hit to include some context, the default is `0`.
`cutSpanAftersec`	Double; End the media and transcript cut some seconds before the hit to include some context, the default is `0`.
`runSearch`	Logical; if `TRUE` search will be run in corpus object, if `FALSE` only the search object will be created.

Value

Search object.

Examples

library(act)

# Search for the 1. Person Singular Pronoun in Spanish.
mysearch <- act::search_new(examplecorpus, pattern= "yo")
mysearch
# Search in normalized content vs. original content
mysearch.norm  <- act::search_new(examplecorpus, pattern="yo", searchNormalized=TRUE)
mysearch.org   <- act::search_new(examplecorpus, pattern="yo", searchNormalized=FALSE)
mysearch.norm@results.nr
mysearch.org@results.nr

# The difference is because during normalization capital letters will be converted
# to small letters. One annotation in the example corpus contains a "yo" with a
# capital letter:
mysearch <- act::search_new(examplecorpus, pattern="yO", searchNormalized=FALSE)
mysearch@results$hit

# Search in full text vs. original content.
# Full text search will find matches across annotations.
# Let's define a regular expression with a certain span.
# Search for the word "no" 'no' followed by a "pero" 'but'
# in a distance ranging from 1 to 20 characters.
myRegEx <- "\\bno\\b.{1,20}pero"
mysearch <- act::search_new(examplecorpus, pattern=myRegEx, searchMode="fulltext")
mysearch
mysearch@results$hit

oliverehmer/act documentation built on March 11, 2023, 1:30 p.m.