tokenizer_basic | R Documentation |
tokenizes a column in a dataframe
tokenizer_basic( dat, ..., col_nm, row_name_nm, token_type = col_nm, token_col_nm = "token", drop_col = TRUE, token_index = "", pre_token_clean_str = clean_str, post_token_clean_Str = clean_str_2 )
dat |
dataframe. No default. |
... |
passed to both clean_str and tidytext::unnest_tokens |
col_nm |
string, name of column to tokenize |
row_name_nm |
string, name of column to put row_name into |
token_type |
string of the type of token for the given column. Default is col_nm |
token_col_nm |
String, column name of new tokens. |
drop_col |
Boolean. If True drops the original column, default = TRUE |
token_index |
String. name of column that will have index of order of tokens in origional column, Default "" |
pre_token_clean_str |
function. that takes vector of strings and ... cleans the string. will clean the string before tokenization. Default clean_str. |
post_token_clean_Str |
function. that takes vector of strings and ... cleans the string. will clean the string before tokenization. Default clean_str_2. |
dat_ceo <- readr::read_csv('https://tinyurl.com/2p8etjr6') dat_ceo |> tokenizer_basic(col_nm = 'exec_fullname', row_name_nm = 'rn', drop_col = FALSE) |> dplyr::select_at(c('token', 'exec_fullname'))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.