search-class: Search object
In act: Aligned Corpus Toolkit

search-class

R Documentation

Search object

Description

This object defines the properties of a search in act. It also contains the results of this search in a specific corpus, if the search has already been run. (Note that you can also create a search without running it immediately). A search object can be run on different corpora.

Some of the slots are defined by the user. Other slots are [READ ONLY], which means that they can be accessed by the user but should not be changed. They contain values that are filled when you execute functions on the object.

Slots

name: Character string; name of the search. Will be used, for example, as name of the sub folder when creating media cuts
pattern: Character string; search pattern as a regular expression.
search.mode: Character string; defines if the original contents of the annotations should be searched or if the full texts should be searched. Slot takes the following values: content, fulltext (=default, includes both full text modes), fulltext.byTime, fulltext.byTier.
search.normalized: logical. if TRUE the normalized annotations will be used for searching.
resultid.prefix: Character string; search results will be numbered consecutively; This character string will be placed before the consecutive numbers.
resultid.start: Integer; search results will be numbered consecutively; This is the start number of the identifiers.
filter.transcript.names: Vector of character strings; names of transcripts to include in the search. If the value is character() or "" filter will be ignored.
filter.transcript.includeRegEx: Character string; Regular expression that defines which transcripts should be INcluded in the search (matching the name of the transcript).
filter.transcript.excludeRegEx: Character string; Regular expression that defines which transcripts should be EXcluded in the search (matching the name of the transcript).
filter.tier.names: Vector of character strings; names of tiers to include in the search. If the value is character() or "" filter will be ignored.
filter.tier.includeRegEx: Character string; Regular expression that defines which tiers should be INcluded in the search (matching the name of the tier).
filter.tier.excludeRegEx: Character string; Regular expression that defines which tiers should be EXcluded in the search (matching the name of the tier).
filter.section.startsec: Double; Time value in seconds, limiting the search to a certain time span in each transcript, defining the start of the search window.
filter.section.endsec: Double; Time value in seconds, limiting the search to a certain time span in each transcript, defining the end of the search window.
concordance.make: Logical; If a concordance should be created when the search is run.
concordance.width: Integer; number of characters to include in the concordance.
cuts.span.beforesec: Double; Seconds how much the cuts (media and print transcripts) should start before the start of the search hit.
cuts.span.aftersec: Double; Seconds how much the cuts (media and print transcripts) should end after the end of the search hit.
cuts.column.srt: Character string; name of destination column in the search results data frame where the srt substitles will be inserted; column will be created if not present in data frame; set to "" for no insertion.
cuts.column.printtranscript: Character string; name of destination column in the search results data frame where the print transcripts will be inserted; column will be created if not present in data frame; set to "" for no insertion.
cuts.printtranscripts: Character string; [READ ONLY] All print transcripts for the search results (if generated previously)
cuts.cutlist.mac: Character string; [READ ONLY] 'FFmpeg' cut list for use on a Mac, to cut the media files for the search results.
cuts.cutlist.win: Character string; [READ ONLY] 'FFmpeg' cut list for use on Windows, to cut the media files for the search results.
results: Data.frame; Results of the search.1
results.nr: Integer; [READ ONLY] Number of search results.
results.tiers.nr: Integer; [READ ONLY] Number of tiers over which the search results are distrubuted.
results.transcripts.nr: Integer; [READ ONLY] Number of transcripts over which the search results are distrubuted.
x.name: Character string; [READ ONLY] name of the corpus object on which the search has been run.

Examples

library(act)

# Search for the 1. Person Singular Pronoun in Spanish.
mysearch <- act::search_new(examplecorpus, pattern= "yo")
mysearch
# Search in normalized content vs. original content
mysearch.norm  <- act::search_new(examplecorpus, pattern="yo", searchNormalized=TRUE)
mysearch.org   <- act::search_new(examplecorpus, pattern="yo", searchNormalized=FALSE)
mysearch.norm@results.nr
mysearch.org@results.nr

# The difference is because during normalization capital letters will be converted
# to small letters. One annotation in the example corpus contains a "yo" with a
# capital letter:
mysearch <- act::search_new(examplecorpus, pattern="yO", searchNormalized=FALSE)
mysearch@results$hit

# Search in full text vs. original content.
# Full text search will find matches across annotations.
# Let's define a regular expression with a certain span.
# Search for the word "no" 'no' followed by a "pero" 'but'
# in a distance ranging from 1 to 20 characters.
myRegEx <- "\\bno\\b.{1,20}pero"
mysearch <- act::search_new(examplecorpus, pattern=myRegEx, searchMode="fulltext")
mysearch
mysearch@results$hit

act documentation built on June 7, 2023, 6:16 p.m.