get_paired_row_names: Returns a dataframe with two columns indicating the rows of...

get_paired_row_namesR Documentation

Returns a dataframe with two columns indicating the rows of each Df that are to be paired

Description

Returns a dataframe with two columns indicating the rows of each Df that are to be paired

Usage

get_paired_row_names(
  t_dat,
  x_tokens = t_dat$x$tokens,
  y_tokens = t_dat$y$tokens,
  tokens_to_keep = NULL,
  token_join_by = TOKEN_TOKEN_TYPE_VEC,
  suffix = paste0(".", c(t_dat$x$suffix, t_dat$y$suffix)),
  min_token_u_prob = TOKEN_MIN_UPROB_DEFAULT
)

Arguments

t_dat,

a list that is a t_dat object

x_tokens

defaults as t_dat$x$tokens

y_tokens

defaults as t_dat$y$tokens

tokens_to_keep

dataframe indicate which tokens matterin the analysis. typically this is a filtered subset of t_dat$tokens_all. Default is NULL. If value is NULL we filter t_dat$tokens_all by min_token_u_prob and u_prob != 0

token_join_by

defaults to TOKEN_TOKEN_TYPE_VEC

suffix

defaults to paste0(".",c(t_dat$x$suffix, t_dat$y$suffix))

min_token_u_prob

minimum u_prob of token to use as join


csps-efpc/TokenLink documentation built on Feb. 10, 2023, 3:30 a.m.