tokenize_df: Tokenize a dataframe and multiple columns in the dataframe
In csps-efpc/TokenLink: Joins two dataframes using tokens or like words

tokenize_df

R Documentation

Tokenize a dataframe and multiple columns in the dataframe

Description

Tokenize a dataframe and multiple columns in the dataframe

Usage

tokenize_df(dat, ..., col_nms, token_types = col_nms)

Arguments

`dat`	dataframe
`...`	passed to tokenize_col
`col_nms`	vector of string. These strings are column names in dat to tokenize. Default None
`token_types`	vector of strings. these are the type of tokens for each token column

Examples

temp_fn <- tempfile()
download.file("https://www150.statcan.gc.ca/n1/pub/37-26-0001/2021001/ODEF_v2.zip",temp_fn)
dat_odef <- readr::read_csv(unz(temp_fn, "ODEF_v2/ODEF_v2.csv"))
dat_odef |> tokenize_df(col_nms = c('Facility_Name','Facility_Type', 'Authority_Name', 'Full_Addr'), token_types = c('company_name', 'company_name', 'company_name', 'Address'))

csps-efpc/TokenLink documentation built on Feb. 10, 2023, 3:30 a.m.