cpos-method: Get corpus positions for a query or queries.

cposR Documentation

Get corpus positions for a query or queries.

Description

Get matches for a query in a CQP corpus (subcorpus, partition etc.), optionally using the CQP syntax of the Corpus Workbench (CWB).

Usage

cpos(.Object, ...)

## S4 method for signature 'corpus'
cpos(
  .Object,
  query,
  p_attribute = getOption("polmineR.p_attribute"),
  cqp = is.cqp,
  regex = FALSE,
  check = TRUE,
  verbose = TRUE,
  ...
)

## S4 method for signature 'character'
cpos(
  .Object,
  query,
  p_attribute = getOption("polmineR.p_attribute"),
  cqp = is.cqp,
  check = TRUE,
  verbose = TRUE,
  ...
)

## S4 method for signature 'slice'
cpos(
  .Object,
  query,
  cqp = is.cqp,
  check = TRUE,
  p_attribute = getOption("polmineR.p_attribute"),
  verbose = TRUE,
  ...
)

## S4 method for signature 'partition'
cpos(
  .Object,
  query,
  cqp = is.cqp,
  check = TRUE,
  p_attribute = getOption("polmineR.p_attribute"),
  verbose = TRUE,
  ...
)

## S4 method for signature 'subcorpus'
cpos(
  .Object,
  query,
  cqp = is.cqp,
  check = TRUE,
  p_attribute = getOption("polmineR.p_attribute"),
  verbose = TRUE,
  ...
)

## S4 method for signature 'matrix'
cpos(.Object)

## S4 method for signature 'hits'
cpos(.Object)

## S4 method for signature ''NULL''
cpos(.Object)

Arguments

.Object

A length-one character vector indicating a CWB corpus, a partition object, or a matrix with corpus positions.

...

Used for reasons of backwards compatibility to process arguments that have been renamed (e.g. pAttribute).

query

A character vector providing one or multiple queries (token to look up, regular expression or CQP query). Token ids (i.e. integer values) are also accepted. If query is neither a regular expression nor a CQP query, a sanity check removes accidental leading/trailing whitespace, issuing a respective warning.

p_attribute

The p-attribute to search. Needs to be stated only if query is not a CQP query. Defaults to NULL.

cqp

Either logical (TRUE if query is a CQP query), or a function to check whether query is a CQP query or not (defaults to is.cqp auxiliary function).

regex

Interpret query as a regular expression.

check

A logical value, whether to check validity of CQP query using check_cqp_query.

verbose

A logical value, whether to show messages.

Details

If the cpos()-method is applied on a character or partition object, the result is a two-column matrix with the regions (start end end corpus positions of the matches) for a query. CQP syntax can be used. The encoding of the query is adjusted to conform to the encoding of the CWB corpus. If there are not matches, NULL is returned.

If the cpos()-method is called on a matrix object, the cpos matrix is unfolded, the return value is an integer vector with the individual corpus positions.

If .Object is a hits object, an integer vector is returned with the individual corpus positions.

. If .Object is a matrix, it is assumed to be a region matrix, i.e. a two-column matrix with left and right corpus positions in the first and second row, respectively. For many operations, such as decoding the token stream, it is necessary to inflate the denoted regions into a vector of all corpus positions referred to by the regions defined in the matrix. The cpos-method for matrix objects will performs this task robustly.

If .Object is NULL, the method will return an empty integer vector. Used internally to handle NULL objects that may be returned from the cpos-method if no matches are obtained for a query.

Value

Unless .Object is a matrix, the return value is a matrix with two columns. The first column reports the left/starting corpus positions (cpos) of the hits obtained. The second column reports the right/ending corpus positions of the respective hit. The number of rows is the number of hits. If there are no hits, a NULL object is returned.

Examples

use(pkg = "RcppCWB", corpus = "REUTERS")

# looking up single tokens
cpos("REUTERS", query = "oil")
corpus("REUTERS") %>% cpos(query = "oil")
corpus("REUTERS") %>% subset(grepl("saudi-arabia", places)) %>% cpos(query = "oil")
partition("REUTERS", places = "saudi-arabia", regex = TRUE) %>% cpos(query = "oil")

# using CQP query syntax
cpos("REUTERS", query = '"Saudi" "Arabia"')
corpus("REUTERS") %>% cpos(query = '"Saudi" "Arabia"')
corpus("REUTERS") %>%
  subset(grepl("saudi-arabia", places)) %>%
  cpos(query = '"Saudi" "Arabia"', cqp = TRUE)
partition("REUTERS", places = "saudi-arabia", regex = TRUE) %>%
  cpos(query = '"Saudi" "Arabia"', cqp = TRUE)

polmineR documentation built on Aug. 26, 2022, 5:15 p.m.