tCorpus-cash-kwic: Get keyword-in-context (KWIC) strings

Description Arguments Examples

Description

Create a data.frame with keyword-in-context strings for given indices (i), search results (hits) or search strings (keyword).

Usage:

## R6 method for class tCorpus. Use as tc$method (where tc is a tCorpus object).

1
2
3
4
kwic(hits = NULL, i = NULL, query = NULL, code = '',
     ntokens = 10, nsample = NA, output_feature = 'token',
     context_levels = c('document','sentence'),
     prettypaste = T, kw_tag = c('<','>'), ...)

Arguments

hits

results of feature search. see tCorpus$search_features.

i

instead of the hits argument, you can give the indices of features directly.

query

instead of using the hits or i arguments, a search string can be given directly. Note that this simply a convenient shorthand for first creating a hits object with tCorpus$search_features. If a query is given, then the ... argument is used to pass other arguments to tCorpus$search_features.

code

if 'i' or 'query' is used, the code argument can be used to add a code label. Should be a vector of the same length that gives the code for each i or query, or a vector of length 1 for a single label.

ntokens

an integers specifying the size of the context, i.e. the number of tokens left and right of the keyword.

n

a number, specifying the total number of hits

nsample

like n, but with a random sample of hits. If multiple codes are used, the sample is drawn for each code individually.

output_feature

the feature column that is used to make the KWIC.

context_level

Select the maxium context (document or sentence).

kw_tag

a character vector of length 2, that gives the symbols before (first value) and after (second value) the keyword in the KWIC string. Can for instance be used to prepare KWIC with format tags for highlighting.

...

See tCorpus$search_features for the query parameters

Examples

1
2
3
4
5
6
7
8
tc = tokens_to_tcorpus(corenlp_tokens, sentence_col = 'sentence', token_id_col = 'id')

## look directly for a term (or complex query)
tc$kwic(query = 'love*')

## or, first perform a feature search, and then get the KWIC for the results
hits = tc$search_features('(john OR mark) AND mary AND love*', context_level = 'sentence')
tc$kwic(hits, context_level = 'sentence')

kasperwelbers/corpustools documentation built on Dec. 5, 2018, 9:11 a.m.