GetMaybeRel: Find Putative Relatives

View source: R/GetMaybeRel.R

GetMaybeRelR Documentation

Find Putative Relatives

Description

Identify pairs of individuals likely to be related, but not assigned as such in the provided pedigree.

Usage

GetMaybeRel(
  GenoM = NULL,
  SeqList = NULL,
  Pedigree = NULL,
  LifeHistData = NULL,
  AgePrior = NULL,
  Module = "par",
  Complex = "full",
  Herm = "no",
  Err = 1e-04,
  ErrFlavour = "version2.9",
  Tassign = 0.5,
  Tfilter = -2,
  MaxPairs = 7 * nrow(GenoM),
  quiet = FALSE,
  ParSib = NULL,
  MaxMismatch = NA
)

Arguments

GenoM

numeric matrix with genotype data: One row per individual, one column per SNP, coded as 0, 1, 2, missing values as a negative number or NA. You can reformat data with GenoConvert, or use other packages to get it into a genlight object and then use as.matrix.

SeqList

list with output from sequoia. SeqList$Pedigree is used if present, and SeqList$PedigreePar otherwise, and overrides the input parameter Pedigree. If 'Specs' is present, its elements override all input parameters with the same name. The list elements ‘LifeHist’, ‘AgePriors’, and ‘ErrM’ are also used if present, and similarly override the corresponding input parameters.

Pedigree

dataframe with id - dam - sire in columns 1-3. May include non-genotyped individuals, which will be treated as dummy individuals. When provided, all likelihoods (and thus all maybe-relatives) are conditional on this pedigree. Note: SeqList$Pedigree or SeqList$PedigreePar take precedent (for this function only).

LifeHistData

data.frame with up to 6 columns:

ID

max. 30 characters long

Sex

1 = female, 2 = male, 3 = unknown, 4 = hermaphrodite, other numbers or NA = unknown

BirthYear

birth or hatching year, integer, with missing values as NA or any negative number.

BY.min

minimum birth year, only used if BirthYear is missing

BY.max

maximum birth year, only used if BirthYear is missing

Year.last

Last year in which individual could have had offspring. Can e.g. in mammals be the year before death for females, and year after death for males.

"Birth year" may be in any arbitrary discrete time unit relevant to the species (day, month, decade), as long as parents are never born in the same time unit as their offspring, and only integers are used. Individuals do not need to be in the same order as in ‘GenoM’, nor do all genotyped individuals need to be included.

AgePrior

Agepriors matrix, as generated by MakeAgePrior and included in the sequoia output. Affects which relationships are considered possible (only those where P(A|R) / P(A) > 0).

Module

type of relatives to check for. One of

par

parent - offspring pairs

ped

all first and second degree relatives

When 'par', all pairs are returned that are more likely parent-offspring than unrelated, potentially including pairs that are even more likely to be otherwise related.

Complex

Breeding system complexity. Either "full" (default), "simp" (simplified, no explicit consideration of inbred relationships), "mono" (monogamous).

Herm

Hermaphrodites, either "no", "A" (distinguish between dam and sire role, default if at least 1 individual with sex=4), or "B" (no distinction between dam and sire role). Both of the latter deal with selfing.

Err

estimated genotyping error rate, as a single number, or a length 3 vector with P(hom|hom), P(het|hom), P(hom|het), or a 3x3 matrix. See details below. The error rate is presumed constant across SNPs, and missingness is presumed random with respect to actual genotype. Using Err >5% is not recommended, and Err >10% strongly discouraged.

ErrFlavour

function that takes Err (single number) as input, and returns a length 3 vector or 3x3 matrix, or choose from inbuilt options 'version2.9', 'version2.0', 'version1.3', or 'version1.1', referring to the sequoia version in which they were the default. Ignored if Err is a vector or matrix. See ErrToM for details.

Tassign

minimum LLR required for acceptance of proposed relationship, relative to next most likely relationship. Higher values result in more conservative assignments. Must be zero or positive.

Tfilter

threshold log10-likelihood ratio (LLR) between a proposed relationship versus unrelated, to select candidate relatives. Typically a negative value, related to the fact that unconditional likelihoods are calculated during the filtering steps. More negative values may decrease non-assignment, but will increase computational time.

MaxPairs

the maximum number of putative pairs to return.

quiet

logical, suppress messages.

ParSib

DEPRECATED, use Module either 'par' to check for putative parent-offspring pairs only, or 'sib' to check for all types of first and second degree relatives.

MaxMismatch

DEPRECATED AND IGNORED. Now calculated automatically using CalcMaxMismatch.

Details

When Module="par", the age difference of the putative pair is temporarily set to NA so that genetic parent-offspring pairs declared to be born in the same year may be discovered. When Module="ped", only relationships possible given the age difference, if known from the LifeHistData, are considered.

Value

A list with

MaybePar

A dataframe with non-assigned likely parent-offspring pairs, with columns:

  • ID1

  • ID2

  • TopRel: the most likely relationship, using abbreviations listed below

  • LLR: Log10-Likelihood Ratio between most likely and next most likely relationship

  • OH: Number of loci at which the two individuals are opposite homozygotes

  • BirthYear1: Birth year of ID1 (copied from LifeHistData)

  • BirthYear2

  • AgeDif: Age difference; BirthYear1 - BirthYear2

  • Sex1: Sex of ID1 (copied from LifeHistData)

  • Sex2

  • SnpdBoth: Number of loci at which the two individuals are both successfully genotyped

MaybeRel

A dataframe with non-assigned likely pairs of relatives, with columns identical to MaybePar

MaybeTrio

A dataframe with non-assigned parent-parent-offspring trios, with columns:

  • ID

  • parent1

  • parent2

  • TopRel: the most likely relationship, using abbreviations listed below

  • LLRparent1: Log10-Likelihood Ratio between parent1 being a parent of ID vs the next most likely relationship between the pair, ignoring parent2

  • LLRparent2: as LLRparent1

  • LLRpair: LLR for the parental pair, versus the next most likely configuration between the three individuals (with one or neither parent assigned)

  • OHparent1: Number of loci at which ID and parent1 are opposite homozygotes

  • OHparent2: as OHparent1

  • MEpair: Number of Mendelian errors between the offspring and the parent pair, includes OH as well as e.g. parents being opposing homozygotes, but the offspring not being a heterozygote. The offspring being OH with both parents is counted as 2 errors.

  • SNPd.id.parent1: Number of loci at which ID and parent1 are both successfully genotyped

  • SNPd.id.parent2: as SNPd.id.parent1

The following categories are used in column 'TopRel', indicating the most likely relationship category:

PO

Parent-Offspring

FS

Full Siblings

HS

Half Siblings

GP

GrandParent - grand-offspring

FA

Full Avuncular (aunt/uncle)

2nd

2nd degree relatives, not enough information to distinguish between HS,GP and FA

Q

Unclear, but probably 1st, 2nd or 3rd degree relatives

See Also

sequoia to identify likely pairs of duplicate genotypes and for pedigree reconstruction; GetRelM to identify all pairs of relatives in a pedigree; CalcPairLL for the likelihoods underlying the LLR.

Examples

## Not run: 
# without conditioning on pedigree
MaybeRel_griffin <- GetMaybeRel(GenoM=Geno_griffin, Err=0.001, Module='par')

## End(Not run)
names(MaybeRel_griffin)

# conditioning on pedigree
MaybePO <- GetMaybeRel(GenoM = Geno_griffin, SeqList = SeqOUT_griffin,
                      Module = 'par')
head(MaybePO$MaybePar)

# instead of providing the entire SeqList, one may specify the relevant
# elements separately
Maybe <- GetMaybeRel(GenoM = Geno_griffin,
                     Pedigree = SeqOUT_griffin$PedigreePar,
                     LifeHistData = LH_griffin,
                     Err=0.0001, Complex = "full",
                     Module = "ped")
head(Maybe$MaybeRel)

# visualise results, turn dataframe into matrix first:
MaybeM <- GetRelM(Pairs = Maybe$MaybeRel)
PlotRelPairs(MaybeM)
# or combine with pedigree (note suffix '?')
RelM <- GetRelM(Pedigree =SeqOUT_griffin$PedigreePar, Pairs = Maybe$MaybeRel)
PlotRelPairs(RelM)


sequoia documentation built on July 4, 2024, 1:10 a.m.