searchQuery: Find tokens using a Lucene-like search query

Description Usage Arguments Details Value

Description

Search tokens in a tokenlist using a query that consists of an indicator, and optionally a condition. For a detailed explanation of the query language please consult the query_tutorial markdown file. For a quick summary see the details below.

Usage

1
2
3
4
5
searchQuery(tokens, indicator, condition = "", code = "",
  default.window = NA, condition_once = FALSE, indicator_filter = rep(T,
  nrow(tokens)), presorted = F, doc.col = getOption("doc.col", "doc_id"),
  position.col = getOption("position.col", "position"),
  word.col = getOption("word.col", "word"))

Arguments

tokens

a tokenlist object. See ?asTokenlist() for details.

indicator

The indicator part of the query, see explanation in query_tutorial markdown or in details below

condition

The condition part of the query, see explanation in query_tutorial markdown or in details below

code

The code given to the tokens that match the query (usefull when looking for multiple queries)

default.window

Determines the default word distance of the condition terms to the indicator (thus, if no specific word distance is set with the ~ symbol)

condition_once

logical. If TRUE, then if an indicator satisfies its conditions once in an article, all indicators within that article are coded.

indicator_filter

A logical vector that indicates which tokens can match an indicator. Can for instance be used to only select tokens that are proper names (using POS tagging) when looking for people.

presorted

The data has to be sorted on order(doc_id, position). If this is already the case, presorted can be set to TRUE to save time (which is usefull when testing many individual queries for large tokenlists)

doc.col

The name of the document_id column. Defaults to "doc_id", unless a global default is specified using setTokenlistColnames()

position.col

The name of the column giving the position in a document. Defaults to "position", unless a global default is specified using setTokenlistColnames()

word.col

The name of the column containing the token text. Defaults to "word", unless a global default is specified using setTokenlistColnames()

Details

Brief summary of the query language

The indicator:

The condition:

Parameters:

Value

a data.frame containing the words that match the query, and their locations in the tokenlist


kasperwelbers/semnet documentation built on May 20, 2019, 7:38 a.m.