Man pages for csps-efpc/TokenLink
Joins two dataframes using tokens or like words

calc_m_probCalculates what m_prob should be, takes in a dataframe and...
calculate_prioriGenerates vector of priori values one for each row in...
clean_strReplaces tokens, and cleans a string using regex stuff...
clean_str_2Cleans a string using after it has been tokenized as a like...
find_posteriorAppends dataframes with posteriors and returns it
find_posterior_all_evidencet_dat should have been run through...
find_posterior_chunkedLike "find_posterior" but it uses less memory at any one time...
find_posterior_positive_evidence_onlyCreates a subset of pairs to check in more detail.
find_posterior_subsetLike "find_posterior" but this will always set "return_all =...
generate_all_tokensGenerates a dataframe with the total counts of each tokens...
get_paired_row_namesReturns a dataframe with two columns indicating the rows of...
joined_resultsReturns a joined dataframe
keep_tokensGiven a dataframe of all tokens, object will return a...
maybe_addadds value to lst with the key nm if nm is not already in lst
maybe_dowill apply func to x if bool is TRUE. Saves us from an ugly...
read_replacements_token_typeReads in a replacement token file
read_replacements_token_type_get_fnget the name of the token replacement file
reclin_pair_blockingcreates a pair blocking based on columns passed into...
refine_posteriorAfter generating probabilities for a list of pairs this will...
scale_to_probScales a vector from 1-priori_delta to priori_delta
token_countTakes a dataframe with columns from cols and counts the...
tokenize_ationsTakes a dataframe and tokenizes the columns indicated and...
tokenize_ations_m_u_probJoins two objects together that come back from the...
tokenize_colturns a column of strings into a tokenized dataframe this...
tokenize_dfTokenize a dataframe and multiple columns in the dataframe
tokenizer_basictokenizes a column in a dataframe
token_linksReturns the required information about the joint probability...
token_most_commonReturns a dataframe of common ngrams
token_replacement_generatorreturns a vector of replacement tokens
write_token_replacementGiven a vector of strings this will create or append a file...
csps-efpc/TokenLink documentation built on Feb. 10, 2023, 3:30 a.m.