getAssignCat: Assignability of Reference Pedigree

View source: R/getAssignable.R

getAssignCatR Documentation

Assignability of Reference Pedigree

Description

Identify which individuals are SNP genotyped, and which can potentially be substituted by a dummy individual ('Dummifiable').

Usage

getAssignCat(Pedigree, SNPd, minSibSize = "1sib1GP")

Arguments

Pedigree

dataframe with columns id-dam-sire. Reference pedigree.

SNPd

character vector with ids of genotyped individuals.

minSibSize

minimum requirements to be considered 'dummifiable':

  • '1sib' : sibship of size 1, i.e. the non-genotyped individual has at least 1 genotyped offspring. If there is no sibship-grandparent this isn't really a sibship, but can be useful in some situations. Used by CalcOHLLR.

  • '1sib1GP': sibship of size 1 with at least 1 genotyped grandparent. The minimum to be potentially assignable by sequoia.

  • '2sib': at least 2 siblings, with or without grandparents. Used by PedCompare.

.

Details

It is assumed that all individuals in SNPd have been genotyped for a sufficient number of SNPs. To identify samples with a too-low call rate, use CheckGeno. To calculate the call rate for all samples, see the examples below.

Some parents indicated here as assignable may never be assigned by sequoia, for example parent-offspring pairs where it cannot be determined which is the older of the two, or grandparents that are indistinguishable from full avuncular (i.e. genetics inconclusive because the candidate has no parent assigned, and ageprior inconclusive).

Value

The Pedigree dataframe with 3 additional columns, id.cat, dam.cat and sire.cat, with coding similar to that used by PedCompare:

G

Genotyped

D

Dummy or 'dummifiable'

X

Not genotyped and not dummifiable, or no parent in pedigree

Examples

PedA <- getAssignCat(Ped_HSg5, rownames(SimGeno_example))
tail(PedA)
table(PedA$dam.cat, PedA$sire.cat, useNA="ifany")

# calculate call rate
## Not run: 
CallRates <- apply(MyGenotypes, MARGIN=1,
                   FUN = function(x) sum(x!=-9)) / ncol(MyGenotypes)
hist(CallRates, breaks=50, col="grey")
GoodSamples <- rownames(MyGenotypes)[ CallRates > 0.8]
# threshold depends on total number of SNPs, genotyping errors, proportion
# of candidate parents that are SNPd (sibship clustering is more prone to
# false positives).
PedA <- getAssignCat(MyOldPedigree, rownames(GoodSamples))

## End(Not run)

sequoia documentation built on Sept. 8, 2023, 5:29 p.m.