subset_query: Subset tCorpus token data using a query
In corpustools: Managing, Querying and Analyzing Tokenized Text

subset_query

R Documentation

Subset tCorpus token data using a query

Description

A convenience function that searches for contexts (documents, sentences), and uses the results to subset the tCorpus token data.

Usage

subset_query(
  tc,
  query,
  feature = "token",
  context_level = c("document", "sentence"),
  not = F,
  as_ascii = F,
  window = NA
)

Arguments

`tc`	A `tCorpus`
`query`	A character string that is a query. See search_contexts for query syntax.
`feature`	The name of the feature columns on which the query is used.
`context_level`	Select whether the query and subset are performed at the document or sentence level.
`not`	If TRUE, perform a NOT search. Return the articles/sentences for which the query is not found.
`as_ascii`	if TRUE, perform search in ascii.
`window`	If used, uses a word distance as the context (overrides context_level)

Details

See the documentation for search_contexts for an explanation of the query language.

Examples

text = c('A B C', 'D E F. G H I', 'A D', 'GGG')
tc = create_tcorpus(text, doc_id = c('a','b','c','d'), split_sentences = TRUE)

## subset by reference
tc2 = subset_query(tc, 'A')
tc2$meta

corpustools documentation built on May 31, 2023, 8:45 p.m.