assign_closest_matches: Finds the closest genetic matches to a set of individuals,...

View source: R/script - gene-match-o-matic 2000.R

assign_closest_matchesR Documentation

Finds the closest genetic matches to a set of individuals, checked up against another set of individuals. SNP based.

Description

Finds and assigns matches between two dataframe of genetic samples (SNPs and their genotypes) Both dataframes must be formated as:
Rows=individuals/samples
Column=SNP genotypes. Spelled as bases, e.g. "AB", "CT", "CC" etc.
Must contain a column "ID" with a unique identifier for each row.

Returns df_samples with an additional column, "ID_match", which is the closest match of that individual

Needs to create a similarity matrix to operate. This is time consuming, and if you have already created one you can refer to it using the parameter similarity_matrix. After creating the similarity matrix, the function saves the matrix as a csv file for use later.

Usage

assign_closest_matches(
  df_samples,
  df_lookup,
  similarity_matrix,
  cutoff_similarity = 0.9,
  conflicts_resolve = T
)

Arguments

df_samples

Dataframe with all samples to be lookup up

df_lookup

Dataframe with samples to be looked up against (can be larger than df_samples)

similarity_matrix

A similarity matrix is created automatically, but if you already have made a similarity matrix (see function create_similarityMatrix()) you can refer to it here and save some time.

cutoff_similarity

Any match with a similarity (0 to 1) lower than this is set as NA

resolve_conflicts

If conflicts (samples with the same match) should be resolved


Eiriksen/Genotools documentation built on Oct. 1, 2022, 1:40 a.m.