token_most_common: Returns a dataframe of common ngrams

View source: R/token_helpers.R

token_most_commonR Documentation

Returns a dataframe of common ngrams

Description

Returns a dataframe of common ngrams

Usage

token_most_common(
  .v,
  n_range = DEFAULT_RANGE_NGRAM_RANGE,
  token = "ngrams",
  n_ngrams_returns = DEFAULT_NUM_NGRAMS_RETURN,
  tokenizer = tokenizer_basic,
  ...
)

Arguments

.v

a vector of strings

n_range

range of ngrames to feed back : Default 1:4

token

passed to tokenizer_basic which passes it to tidytext::unnest_tokens :Default 'ngrams'

n_ngrams_returns

how many of each n to return for most common :Default 12

tokenizer

a function that tokenizes a column of a data frame :Default tokenizer_basic

...

passed to tokenizer_basic

Value

a dataframe showing the most common tokens

Examples

  mtcars |> tibble::rownames_to_column() |> dplyr::pull(rowname) |> token_most_common()
  library(tokenizers)
  mobydick |> stringr::str_split('\\.')   |> magrittr::extract2(1) |> token_most_common()


csps-efpc/TokenLink documentation built on Feb. 10, 2023, 3:30 a.m.