tokenize_col: turns a column of strings into a tokenized dataframe this...

tokenize_colR Documentation

turns a column of strings into a tokenized dataframe this returned dataframe will have two or three columns

Description

turns a column of strings into a tokenized dataframe this returned dataframe will have two or three columns

Usage

tokenize_col(
  dat,
  ...,
  col_nm,
  row_name_nm = TOKENIZE_DEFAULT_ROW_NAME,
  token_type = glue::glue("{col_nm}"),
  tokenizer = tokenizer_basic
)

Arguments

dat

dataframe

...

passed to tokenizer

col_nm

column that will be tokenized.

row_name_nm

name of a return column that has the rownames in the original dataframe default row_name

token_type

name of column that has tokens in return dataframe. Default appends '_type' onto token_col_nm

tokenizer

function that tokenzes the column. Default tidytext::unnest_tokens

Examples

dat_ceo <- readr::read_csv('https://tinyurl.com/2p8etjr6')
tokenize_col(dat = dat_ceo, col_nm = 'coname')
tokenize_col(dat = dat_ceo, col_nm = 'coname', token_type = 'company_name')


csps-efpc/TokenLink documentation built on Feb. 10, 2023, 3:30 a.m.