assign_closest_matches: Finds the closest genetic matches to a set of individuals,...
In Eiriksen/fishytools: What the Package Does (One Line, Title Case)

View source: R/script - gene-match-o-matic 2000.R

Finds and assigns matches between two dataframe of genetic samples (SNPs and their genotypes) Both dataframes must be formated as:
Rows=individuals/samples
Column=SNP genotypes. Spelled as bases, e.g. "AB", "CT", "CC" etc.
Must contain a column "ID" with a unique identifier for each row.

Returns df_samples with an additional column, "ID_match", which is the closest match of that individual

Needs to create a similarity matrix to operate. This is time consuming, and if you have already created one you can refer to it using the parameter similarity_matrix. After creating the similarity matrix, the function saves the matrix as a csv file for use later.

assign_closest_matches(
  df_samples,
  df_lookup,
  project = "-",
  similarity_matrix,
  cutoff_similarity = 0.9,
  cutoff_NA_collective = 50,
  conflicts_resolve = T
)

`df_samples`	Dataframe with all samples to be lookup up
`df_lookup`	Dataframe with samples to be looked up against (can be larger than df_samples)
`similarity_matrix`	A similarity matrix is created automatically, but if you already have made a similarity matrix (see function create_similarityMatrix()) you can refer to it here and save some time.
`cutoff_similarity`	Any match with a similarity (0 to 1) lower than this is set as NA
`resolve_conflicts`	If conflicts (samples with the same match) should be resolved