assign_closest_matches: Finds the closest genetic matches to a set of individuals,...

Description Usage Arguments

View source: R/script - gene-match-o-matic 2000.R

Description

Finds and assigns matches between two dataframe of genetic samples (SNPs and their genotypes) Both dataframes must be formated as:
Rows=individuals/samples
Column=SNP genotypes. Spelled as bases, e.g. "AB", "CT", "CC" etc.
Must contain a column "ID" with a unique identifier for each row.

Returns df_samples with an additional column, "ID_match", which is the closest match of that individual

Needs to create a similarity matrix to operate. This is time consuming, and if you have already created one you can refer to it using the parameter similarity_matrix. After creating the similarity matrix, the function saves the matrix as a csv file for use later.

Usage

1
2
3
4
5
6
7
8
9
assign_closest_matches(
  df_samples,
  df_lookup,
  project = "-",
  similarity_matrix,
  cutoff_similarity = 0.9,
  cutoff_NA_collective = 50,
  conflicts_resolve = T
)

Arguments

df_samples

Dataframe with all samples to be lookup up

df_lookup

Dataframe with samples to be looked up against (can be larger than df_samples)

similarity_matrix

A similarity matrix is created automatically, but if you already have made a similarity matrix (see function create_similarityMatrix()) you can refer to it here and save some time.

cutoff_similarity

Any match with a similarity (0 to 1) lower than this is set as NA

resolve_conflicts

If conflicts (samples with the same match) should be resolved


Eiriksen/fishytools documentation built on April 4, 2020, 5:08 a.m.