tCorpus-cash-code_features: Code features in a tCorpus based on a search string

Description Arguments Examples

Description

Add a column to the token data that contains a code (the query label) for tokens that match the query (see tCorpus$search_features).

Usage:

## R6 method for class tCorpus. Use as tc$method (where tc is a tCorpus object).

1
code_features(query, code=NULL, feature='token', column='code', ...)

Arguments

query

A character string that is a query. See search_features for documentation of the query language.

code

The code given to the tokens that match the query (usefull when looking for multiple queries). Can also put code label in query with # (see details)

feature

The name of the feature column within which to search.

column

The name of the column that is added to the data

add_column

list of name-value pairs, used to add additional columns. The name will become the column name, and the value should be a vector of the same length as the query vector.

context_level

Select whether the queries should occur within while "documents" or specific "sentences".

keep_longest

If TRUE, then overlapping in case of overlapping queries strings in unique_hits mode, the query with the most separate terms is kept. For example, in the text "mr. Bob Smith", the query [smith OR "bob smith"] would match "Bob" and "Smith". If keep_longest is FALSE, the match that is used is determined by the order in the query itself. The same query would then match only "Smith".

as_ascii

if TRUE, perform search in ascii.

verbose

If TRUE, progress messages will be printed

...

alternative way to specify name-value pairs for adding additional columns

Examples

1
2
3
4
tc = create_tcorpus('Anna and Bob are secretive')

tc$code_features(c("actors# anna bob", "associations# secretive"))
tc$get()

kasperwelbers/corpustools documentation built on Sept. 1, 2018, 1:03 p.m.