tokenize_df: Tokenize a dataframe and multiple columns in the dataframe

tokenize_dfR Documentation

Tokenize a dataframe and multiple columns in the dataframe

Description

Tokenize a dataframe and multiple columns in the dataframe

Usage

tokenize_df(dat, ..., col_nms, token_types = col_nms)

Arguments

dat

dataframe

...

passed to tokenize_col

col_nms

vector of string. These strings are column names in dat to tokenize. Default None

token_types

vector of strings. these are the type of tokens for each token column

Examples

temp_fn <- tempfile()
download.file("https://www150.statcan.gc.ca/n1/pub/37-26-0001/2021001/ODEF_v2.zip",temp_fn)
dat_odef <- readr::read_csv(unz(temp_fn, "ODEF_v2/ODEF_v2.csv"))
dat_odef |> tokenize_df(col_nms = c('Facility_Name','Facility_Type', 'Authority_Name', 'Full_Addr'), token_types = c('company_name', 'company_name', 'company_name', 'Address'))


csps-efpc/TokenLink documentation built on Feb. 10, 2023, 3:30 a.m.