View source: R/script - gene-match-o-matic 2000.R
assign_closest_matches | R Documentation |
Finds and assigns matches between two dataframe of genetic samples (SNPs and their genotypes)
Both dataframes must be formated as:
Rows=individuals/samples
Column=SNP genotypes. Spelled as bases, e.g. "AB", "CT", "CC" etc.
Must contain a column "ID" with a unique identifier for each row.
Returns df_samples with an additional column, "ID_match", which is the closest match of that individual
Needs to create a similarity matrix to operate. This is time consuming, and if you have already created one you can refer to it using the parameter similarity_matrix.
After creating the similarity matrix, the function saves the matrix as a csv file for use later.
assign_closest_matches( df_samples, df_lookup, similarity_matrix, cutoff_similarity = 0.9, conflicts_resolve = T )
df_samples |
Dataframe with all samples to be lookup up |
df_lookup |
Dataframe with samples to be looked up against (can be larger than df_samples) |
similarity_matrix |
A similarity matrix is created automatically, but if you already have made a similarity matrix (see function create_similarityMatrix()) you can refer to it here and save some time. |
cutoff_similarity |
Any match with a similarity (0 to 1) lower than this is set as NA |
resolve_conflicts |
If conflicts (samples with the same match) should be resolved |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.