Description Usage Arguments Details Value
Search tokens in a tokenlist using a query that consists of an indicator, and optionally a condition. For a detailed explanation of the query language please consult the query_tutorial markdown file. For a quick summary see the details below.
1 2 3 4 5 |
tokens |
a tokenlist object. See ?asTokenlist() for details. |
indicator |
The indicator part of the query, see explanation in query_tutorial markdown or in details below |
condition |
The condition part of the query, see explanation in query_tutorial markdown or in details below |
code |
The code given to the tokens that match the query (usefull when looking for multiple queries) |
default.window |
Determines the default word distance of the condition terms to the indicator (thus, if no specific word distance is set with the ~ symbol) |
condition_once |
logical. If TRUE, then if an indicator satisfies its conditions once in an article, all indicators within that article are coded. |
indicator_filter |
A logical vector that indicates which tokens can match an indicator. Can for instance be used to only select tokens that are proper names (using POS tagging) when looking for people. |
presorted |
The data has to be sorted on order(doc_id, position). If this is already the case, presorted can be set to TRUE to save time (which is usefull when testing many individual queries for large tokenlists) |
doc.col |
The name of the document_id column. Defaults to "doc_id", unless a global default is specified using setTokenlistColnames() |
position.col |
The name of the column giving the position in a document. Defaults to "position", unless a global default is specified using setTokenlistColnames() |
word.col |
The name of the column containing the token text. Defaults to "word", unless a global default is specified using setTokenlistColnames() |
Brief summary of the query language
The indicator:
is the actual text that has to be found in the token
can contain multiple words with OR statement (and empty spaces are also considered OR statements)
CANNOT contain AND or NOT statements (this is what the condition is for)
accepts the ? wildcard, which means that any single character can be used in this place
accepts the * wildcard, which means that any number of characters can be used in this place
The condition:
has to be TRUE for the indicator to be accepted. Thus, if a condition is given, the query can be interpreted as: indicator AND condition
can contain complex boolean statements, using AND, OR and NOT statements, and using parentheses
accepts the ? and * wildcards
can be specified for a maximum word distance of the indicator. The terms in the condition are looked up within this word distance. The default word distance can be given with the default.window parameter. More specifically, individual terms can be given a custom word distance using the ~ symbol, where "word~50" means that "word" is looked up within 50 words of the indicator. If a default.window is used, it is also possible to ignore the word distance for specific terms by using word~d (where d stands for document).
Parameters:
default.window -> determines the default word distance of the condition terms to the indicator (thus, if no specific word distance is set with the ~ symbol)
condition_once -> if TRUE, then if the condition is satisfied at least once in an article, all occurences of the indicator are accepted.
a data.frame containing the words that match the query, and their locations in the tokenlist
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.