count_discrepancies: from pairs of duplicately genotyped samples, prepare a report...

View source: R/count_discrepancies.R

count_discrepanciesR Documentation

from pairs of duplicately genotyped samples, prepare a report of discrepancies

Description

After running a matching samples analysis we can use the resulting pairs to investigate genotyping discordance rates at different loci. That is what this does.

Usage

count_discrepancies(pairs, genotypes)

Arguments

pairs

a tibble with at least two columns: retained_id and original_id. The comparisons are made between a single canonical individual in retained_id and any other matching samples in original_id. If retained_id == original_id the row is removed.

genotypes

a tibble with columns indiv, locus, gene_copy, and allele_int.

Value

Returns a list of three components as follows:

  • matching_samples_genos: a tibble with 7 columns. The genotypes of two different individuals occupy two different rows. The first row is the first gene copy and the second row is the second gene copy.

    • retained_id: ID of the fish that is used in downstream analyses.

    • original_id: ID of the other fish whose genotype is being compared to that of the retained ID.

    • locus: the locus name.

    • gene_copy: the gene copy index (1 or 2)

    • indiv1_allele: the alleles at the retained_id indiv. These have been sorted in ascending order within the locus to make it easy to compare with the indiv2allele

    • indiv2_allele: same as above, but for original_id

    • num_discrepant_gene_copies: the number of gene copies that are discrepant. 0 = none; 1 = discrepancy where one is heterozygous and the other homozygous; 2 = one individual is homozygous and the other is homozygous for the other allele.

  • genotype_discrepancies_summary: counts and fractions of discrepancies across all retained_id vs original_id pairs at a locus. All columns should be self-explanatory except for gc_wtd_fract which is the average number of discrepant gene copies per genotype at the locus. This might be considered a suitable estimate of the per-allele genotyping error rate.

  • alt_homoz_mismatches: a tibble recording all the genotypes amongst the retained_id vs original_id pairs that are discrepancies in which each member of the pair is homozygous (i.e. they are alternate homozygotes.) Each row denotes a mismatching locus.


eriqande/HatcheryPedAgree documentation built on Sept. 21, 2023, 7:24 p.m.