count_discrepancies: from pairs of duplicately genotyped samples, prepare a report...
In eriqande/HatcheryPedAgree:

count_discrepancies

R Documentation

from pairs of duplicately genotyped samples, prepare a report of discrepancies

Description

After running a matching samples analysis we can use the resulting pairs to investigate genotyping discordance rates at different loci. That is what this does.

Usage

count_discrepancies(pairs, genotypes)

Arguments

`pairs`	a tibble with at least two columns: `retained_id` and `original_id`. The comparisons are made between a single canonical individual in `retained_id` and any other matching samples in `original_id`. If `retained_id == original_id` the row is removed.
`genotypes`	a tibble with columns `indiv`, `locus`, `gene_copy`, and `allele_int`.

Value

Returns a list of three components as follows:

matching_samples_genos: a tibble with 7 columns. The genotypes of two different individuals occupy two different rows. The first row is the first gene copy and the second row is the second gene copy.
- retained_id: ID of the fish that is used in downstream analyses.
- original_id: ID of the other fish whose genotype is being compared to that of the retained ID.
- locus: the locus name.
- gene_copy: the gene copy index (1 or 2)
- indiv1_allele: the alleles at the retained_id indiv. These have been sorted in ascending order within the locus to make it easy to compare with the indiv2allele
- indiv2_allele: same as above, but for original_id
- num_discrepant_gene_copies: the number of gene copies that are discrepant. 0 = none; 1 = discrepancy where one is heterozygous and the other homozygous; 2 = one individual is homozygous and the other is homozygous for the other allele.
genotype_discrepancies_summary: counts and fractions of discrepancies across all retained_id vs original_id pairs at a locus. All columns should be self-explanatory except for gc_wtd_fract which is the average number of discrepant gene copies per genotype at the locus. This might be considered a suitable estimate of the per-allele genotyping error rate.
alt_homoz_mismatches: a tibble recording all the genotypes amongst the retained_id vs original_id pairs that are discrepancies in which each member of the pair is homozygous (i.e. they are alternate homozygotes.) Each row denotes a mismatching locus.