getIBDiR: Selection Significance Statistic

Description Usage Arguments Value See Also Examples

Description

getIBDiR() calculates a summary statistic for each SNP that can be used to assess the significance of excess IBD sharing at genomic loci, thus identifying regions under positive selection. First relatedness between isolates and SNP allele frequencies are accounted for, then normalization procedures are applied where we assume our transformed summary statistic follows a chi-squared distribution with 1 degree of freedom. This allows the calculation of -log10 (P-values) which we denote as the iR statistic. SNPs with iR values greater than some threshold (i.e. -log10 (P-values) > -log10 (0.05)) provide evidence of positive selection. getIBDiR can return NA iR statistics for a number of reasons, including trying to generate iR statistics when there are no IBD pairs or when all pairs are IBD, or when only several isolates are analyzed.

Usage

1
getIBDiR(ped.genotypes, ibd.matrix, groups = NULL)

Arguments

ped.genotypes

A list containing 2 objects. See the Value description in getGenotypes for more details on this input.

ibd.matrix

A data frame containing the binary IBD information for each SNP and each pair. See the returned Value in getIBDmatrix for more details.

groups

A data frame with 3 columns of information:

  1. Family ID

  2. Isolate ID

  3. Group ID

where IBD proportions are calculated for

  1. all pairs of isolates within the same group

  2. all pairwise-group comparisons where isolates belong to different groups

Group ID, for example, can be the geographic regions where the isolates were collected. The default is groups=NULL and IBD proportions will be calculated over all pairs.

Value

A data frame the following 7 columns:

  1. Chromosome (type "character", "numeric" or "integer")

  2. SNP identifiers (type "character")

  3. Genetic map distance (centi morgans, cM) (type "numeric")

  4. Base-pair position (type "integer")

  5. Population (type "character" or "numeric")

  6. Subpopulation (type "character" or "numeric")

  7. iR statistic (type "numeric")

  8. -log10 p vlaue (type "numeric")

where each row describes a unique SNP. The column Population is filled with 1s by default, while Subpopulation contains the group IDs from groups, where the proportion of pairs IBD has been calculated for all pairs of isolates belonging to the same group as well as all pairs of isolates where each isolate belongs to a different group. If groups=NULL then Subpopulation will be filled with 0s also. The population columns have been included for plotting purposes. The data frame is headed chr, snp_id, pos_M, pos_bp, pop, subpop, iR and log10_pvalue respectively.

See Also

getGenotypes, getIBDmatrix and getIBDproportion.

Examples

1
2
3
4
5
6
7
8
# generate a binary IBD matrix
my_matrix <- getIBDmatrix(ped.genotypes = png_genotypes,
                          ibd.segments = png_ibd)

# calculate the significance of IBD sharing
my_iR <- getIBDiR(ped.genotypes = png_genotypes,
                  ibd.matrix = my_matrix,
                  groups = NULL)

bahlolab/isoRelate documentation built on May 11, 2019, 5:25 p.m.