CalcPairLL | R Documentation |
For each specified pair of individuals, calculate the log10-likelihoods of being PO, FS, HS, GP, FA, HA, U (see Details). Individuals must be genotyped or have at least one genotyped offspring.
NOTE values >0
are various NA
types, see 'Likelihood
special codes' in 'Value' section below.
CalcPairLL(
Pairs = NULL,
GenoM = NULL,
Pedigree = NULL,
LifeHistData = NULL,
AgePrior = TRUE,
SeqList = NULL,
Module = "ped",
Complex = "full",
Herm = "no",
InclDup = FALSE,
Err = 1e-04,
ErrFlavour = "version2.9",
Tassign = 0.5,
Tfilter = -2,
quiet = FALSE,
Plot = TRUE
)
Pairs |
dataframe with columns
|
GenoM |
numeric matrix with genotype data: One row per individual,
one column per SNP, coded as 0, 1, 2, missing values as a negative number
or NA. You can reformat data with |
Pedigree |
dataframe with columns id-dam-sire; likelihoods will be calculated conditional on the pedigree. May include non-genotyped individuals, which will be treated as dummy individuals. |
LifeHistData |
data.frame with up to 6 columns:
"Birth year" may be in any arbitrary discrete time unit relevant to the species (day, month, decade), as long as parents are never born in the same time unit as their offspring, and only integers are used. Individuals do not need to be in the same order as in ‘GenoM’, nor do all genotyped individuals need to be included. |
AgePrior |
logical ( |
SeqList |
list with output from |
Module |
if |
Complex |
Breeding system complexity. Either "full" (default), "simp" (simplified, no explicit consideration of inbred relationships), "mono" (monogamous). |
Herm |
Hermaphrodites, either "no", "A" (distinguish between dam and sire role, default if at least 1 individual with sex=4), or "B" (no distinction between dam and sire role). Both of the latter deal with selfing. |
InclDup |
logical, include the likelihood for the two samples to be duplicates (originating from the same individual) in the output? |
Err |
assumed per-locus genotyping error rate, as a single number, or a
length 3 vector with P(hom|hom), P(het|hom), P(hom|het), or a 3x3 matrix.
See details below. The error rate is presumed constant across SNPs, and
missingness is presumed random with respect to actual genotype. Using
|
ErrFlavour |
function that takes |
Tassign |
minimum LLR required for acceptance of proposed relationship, relative to next most likely relationship. Higher values result in more conservative assignments. Must be zero or positive. |
Tfilter |
threshold log10-likelihood ratio (LLR) between a proposed relationship versus unrelated, to select candidate relatives. Typically a negative value, related to the fact that unconditional likelihoods are calculated during the filtering steps. More negative values may decrease non-assignment, but will increase computational time. |
quiet |
logical, suppress messages |
Plot |
logical, display scatter plots by |
The same pair may be included multiple times, e.g. with different sex, age difference, or focal relationship, to explore their effect on the likelihoods. Likelihoods are only calculated for relationships that are possible given the age difference, e.g. PO (parent-offspring) is not calculated for pairs with an age difference of 0.
Non-genotyped individuals can be included if they have at least one
genotyped offspring and can be turned into a dummy (see
getAssignCat
); to establish this a pedigree must be provided.
Warning 1: There is no check whether the input pedigree is genetically
sensible, it is simply conditioned upon. Checking whether a pedigree is
compatible with the SNP data can be done with CalcOHLLR
.
Warning 2: Conditioning on a Pedigree
can make computation
orders of magnitude slower.
The Pairs
dataframe including all optional columns listed
above, plus the additional columns:
xx |
Log10-Likelihood of this pair having relationship xx, with xx being the relationship abbreviations listed below. |
TopRel |
Abbreviation of most likely relationship |
LLR |
Log10-Likelihood ratio between most-likely and second most likely relationships. Other LLRs, e.g. between most-likely and unrelated, can easily be computed. |
Relationship abbreviations:
PO |
Parent - offspring |
FS |
Full siblings |
HS |
Half siblings |
GP |
Grandparent |
FA |
Full avuncular |
HA |
Half avuncular and other 3rd degree relationships |
U |
Unrelated |
2nd |
Unclear which type of 2nd degree relatives (HS, GP, or FA) |
?? |
Unclear which type of 1st, 2nd or 3rd degree relatives |
Likelihood special codes:
222 |
Maybe (via) other parent (e.g. focal="GP", but as likely to be maternal as paternal grandparent, and therefore not assignable) |
333 |
Excluded from comparison (shouldn't occur) |
444 |
Not implemented (e.g. would create an odd double/triple relationship in combination with the provided pedigree) |
777 |
Impossible (e.g. a male (Sex2=2) cannot be mother (patmat=1)) |
888 |
Already assigned in the provided pedigree (see |
999 |
|
This happens when the pair does not pass the initial check which prevents
impossible configurations in combination with Pedigree
. Specifically,
it happens when either or both individuals are a parent in the pedigree, but
the sex in Pairs
is not consistent with that
Pairs
changes the age differences; it is too complex to check
whether or not this still makes all pedigree links valid. Only setting the
age difference to 'unknown' via Pairs
is possible.
This function uses the same machinery as sequoia
, which will to save
time not calculate the likelihood when it is quickly obvious that the pair
cannot be related in the specified manner.
For PO (putative parent-offspring pairs) this is the case when:
the sex of the candidate parent, via Pairs$Sex2
or
LifeHistData
, does not match Pairs$patmat
, which defaults
to 1 (maternal relatives, i.e. dam)
a dam is already assigned via Pedigree
and Pairs$dropPar1
='none'
, and Pairs$patmat = 1
the age difference is zero or otherwise impossible according to the
age prior. It is either calculated from LifeHistData
or specified
via Pairs$AgeDif
. The AgePrior
can be specified directly,
be taken from SeqList
, or calculated automatically by
MakeAgePrior
when both Pedigree
and
LifeHistData
are provided.
Especially when Complex='full', not only the seven relationship alternatives listed above are considered, but a whole range of possible double and even triple relationships. For example, mother A and offspring B (PO) may also be paternal half-siblings (HS, A and A's mother mated with same male), grandmother and grand-offspring (GP, B's father is A's son), or paternal aunt (B's father is a full or half sib of A).
The likelihood reported as 'LL_PO' is the most-likely one of the possible alternatives, among those that are not impossible due to age differences or due to the pedigree (as reconstructed up to that point). Whether e.g. the likelihood to be both PO & HS is counted as PO or as HS, depends on the situation and is determined by the variable 'focal': During parentage assignment, it is counted as PO but not HS, while during sibship clustering, it is counted as HS but not PO – not omitting from the alternative relationship would result in a deadlock.
For historical reasons, the relationships between a dummy ID1 and ID2 are reported *between the sibship and ID2*. So,
ID2 replaces dummy ID1; or merge dummy ID2 with dummy ID1
ID1 parent of ID2
ID2 parent of ID1
ID2 FS resp. HS of ID1
If ID1 is genotyped and ID2 is a dummy, the relationships are as when ID2 is genotyped.
If you wish to retrieve likelihoods for a different set of relationships, please contact me at jisca.huisman@gmail.com .
PlotPairLL
to plot alternative relationship pairs from
the output; LLtoProb
to transform likelihoods to
probabilities; CalcParentProbs
which uses this function to
calculate parental probabilities; GetRelM
to find all
pairwise relatives according to the pedigree; GetMaybeRel
to
get likely relative pairs based on the genetic data.
# Likelihoods depend on the presumed genotyping error rate:
CalcPairLL(Pairs = data.frame(ID1='i042_2003_F', ID2='i015_2001_F'),
GenoM = Geno_griffin, Err = 1e-7, Plot=FALSE)
CalcPairLL(Pairs = data.frame(ID1='i042_2003_F', ID2='i015_2001_F'),
GenoM = Geno_griffin, Err = 1e-3, Plot=FALSE)
## likelihoods underlying parent LLR in pedigree:
# Example: dams for bottom 3 individuals
tail(SeqOUT_griffin$PedigreePar, n=3)
# set up dataframe with these pairs. LLRdam & LLRsire ignore any co-parent
Pairs_d <- data.frame(ID1 = SeqOUT_griffin$PedigreePar$id[140:142],
ID2 = SeqOUT_griffin$PedigreePar$dam[140:142],
focal = "PO",
dropPar1 = 'both')
# Calculate LL's, conditional on the rest of the pedigree + age differences
CalcPairLL(Pairs_d, GenoM = Geno_griffin, Err = 1e-04,
LifeHistData = LH_griffin, Pedigree = SeqOUT_griffin$PedigreePar)
# LLR changes when ignoring age and/or pedigree, as different relationships
# become (im)possible
CalcPairLL(Pairs_d, GenoM = Geno_griffin, Err = 1e-04)
# LLRpair is calculated conditional on co-parent, and min. of dam & sire LLR
Pairs_d$dropPar1 <- 'dam'
Pairs_s <- data.frame(ID1 = SeqOUT_griffin$PedigreePar$id[141:142],
ID2 = SeqOUT_griffin$PedigreePar$sire[141:142],
focal = "PO",
dropPar1 = 'sire')
CalcPairLL(rbind(Pairs_d, Pairs_s), GenoM = Geno_griffin, Err = 1e-04,
LifeHistData = LH_griffin, Pedigree = SeqOUT_griffin$PedigreePar)
## likelihoods underlying LLR in getMaybeRel output:
MaybeRel_griffin$MaybePar[1:5, ]
FivePairs <- MaybeRel_griffin$MaybePar[1:5, c("ID1", "ID2", "Sex1", "Sex2")]
PairLL <- CalcPairLL(Pairs = rbind( cbind(FivePairs, focal = "PO"),
cbind(FivePairs, focal = "HS"),
cbind(FivePairs, focal = "GP")),
GenoM = Geno_griffin, Plot=FALSE)
PairLL[PairLL$ID1=="i121_2007_M", ]
# LL(FS)==222 : HSHA, HSGP, and/or FAHA more likely than FS
# LL(GP) higher when focal=HS: GP via 'other' parent also considered
# LL(FA) higher when focal=PO: FAHA, or FS of 'other' parent
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.