top_features: Show top features
In corpustools: Managing, Querying and Analyzing Tokenized Text

top_features

R Documentation

Show top features

Description

Show top features

Usage

top_features(
  tc,
  feature,
  n = 10,
  group_by = NULL,
  group_by_meta = NULL,
  rank_by = c("freq", "chi2"),
  dropNA = T,
  return_long = F
)

Arguments

`tc`	a tCorpus
`feature`	The name of the feature
`n`	Return the top n features
`group_by`	A column in the token data to group the top features by. For example, if token data contains part-of-speech tags (pos), then grouping by pos will show the top n feature per part-of-speech tag.
`group_by_meta`	A column in the meta data to group the top features by.
`rank_by`	The method for ranking the terms. Currently supports frequency (default) and the 'Chi2' value for the relative frequency of a term in a topic compared to the overall corpus. If return_long is used, the Chi2 score is also returned, but note that there are negative Chi2 scores. This is used to indicate that the relative frequency of a feature in a group was lower than the relative frequency in the corpus (i.e. under-represented).
`dropNA`	if TRUE, drop NA features
`return_long`	if TRUE, results will be returned in a long format that contains more information.

Value

a data.frame

Examples

tc = tokens_to_tcorpus(corenlp_tokens, token_id_col = 'id')

top_features(tc, 'lemma')
top_features(tc, 'lemma', group_by = 'NER', group_by_meta='doc_id')

corpustools documentation built on May 31, 2023, 8:45 p.m.