generate_all_tokens: Generates a dataframe with the total counts of each tokens...

generate_all_tokensR Documentation

Generates a dataframe with the total counts of each tokens across both datasets as well as the m and u probs

Description

Generates a dataframe with the total counts of each tokens across both datasets as well as the m and u probs

Usage

generate_all_tokens(
  x_counts,
  y_counts,
  total_comparisons,
  token_count_join = TOKEN_TOKEN_TYPE_VEC,
  suffix = TOKEN_SUFFIX_DEFAULT,
  m_prob_func = calc_m_prob,
  ...
)

Arguments

x_counts

Counts of tokens from first dataset

y_counts

Counts of tokens from second dataset

total_comparisons

count of the number of comparisons that can happens normally is nrow(x_dat) * nrow(y_dat)

token_count_join

String vector that joins the two token count dataframes. Default c('token','token_type')

suffix

String vector of length 2. Helps identify which column the counts came from. Default c('x','y')

m_prob_func

Function that takes a dataframe with columns token, token_type, n.x, n.y, n_comparisons, u_prob, and returns a vector of m_probs

...

not used

Examples


dat_ceo <- readr::read_csv('https://tinyurl.com/2p8etjr6')
dat_alb <- readr::read_csv('https://tinyurl.com/2p8ap4ad')
t_dat <- token_links(
  dat_x = dat_ceo,
  dat_y = dat_alb,
  args_x = list(col_nms = 'coname'),
  args_y = list(col_nms = 'companyName'),
  token_types = 'company_name',
  token_index = '',
  suffix = c('ceo', 'alb')
)
results <- generate_all_tokens(t_dat$x$token_counts, t_dat$y$token_counts, t_dat$total_comparisons)



csps-efpc/TokenLink documentation built on Feb. 10, 2023, 3:30 a.m.