searchQueries: Find tokens using Lucene-like search queries

Description Usage Arguments Value

Description

Find tokens using Lucene-like search queries

Usage

1
2
3
4
5
searchQueries(tokens, queries, batchsize = 5, default.window = NA,
  condition_once = FALSE, indicator_filter = rep(T, nrow(tokens)),
  presorted = F, doc.col = getOption("doc.col", "doc_id"),
  position.col = getOption("position.col", "position"),
  word.col = getOption("word.col", "word"), verbose = T)

Arguments

tokens

a tokenlist object. See ?asTokenlist() for details.

queries

a data frame containing the queries. See ?searchQuery() for an explanation of the query language and to test individual queries.

batchsize

This function is faster if multiple queries are searched together, but too many queries (with too many tokens) at once can eat up memory or crash R. Try lowering batchsize in case of issues.

default.window

Determines the default word distance of the condition terms to the indicator (thus, if no specific word distance is set with the ~ symbol)

condition_once

logical. If TRUE, then if an indicator satisfies its conditions once in an article, all indicators within that article are coded.

indicator_filter

A logical vector that indicates which tokens can match an indicator. Can for instance be used to only select tokens that are proper names (using POS tagging) when looking for people.

presorted

The data has to be sorted on order(doc_id, position). If this is already the case, presorted can be set to TRUE to save time (which is usefull when testing many individual queries for large tokenlists)

doc.col

The name of the document_id column. Defaults to "doc_id", unless a global default is specified using setTokenlistColnames()

position.col

The name of the column giving the position in a document. Defaults to "position", unless a global default is specified using setTokenlistColnames()

word.col

The name of the column containing the token text. Defaults to "word", unless a global default is specified using setTokenlistColnames()

verbose

show progress

Value

the annotated tokens data frame


kasperwelbers/semnet documentation built on May 20, 2019, 7:38 a.m.