switchBOTgenotypes: XIBD BOT Genotype Switching

Description Usage Arguments Value Examples

Description

The HapMap allele frequencies in XIBDs HapMap allele frequency files are calculated for the A allele only, where the A allele is determined by the following rules:

  1. When one of the possible variations of the SNP is adenine (A), then adenine is labeled the A allele and the remaining variation is labeled the B allele, regardless of what this might be.

  2. If adenine (A) is not a variation of the SNP but cytosine (C) is, then cytosine is labeled the A allele and the remaining variation is labeled the B allele.

  3. If neither adenine (A) or cytosine (C) are variants of the SNP then thymine (T) is labeled the A allele.

Illuminas convention for the naming of A and B alleles differs to that of the HapMap data (http://www.illumina.com/documents/products/technotes/technote_topbot.pdf). Rather, the classification of A and B alleles depend on the top (TOP) and bottom (BOT) designations of the SNP. This means that the A allele in the HapMap data is not always the same as the A allele in the Illumina data. In fact, alleles that have been named according to the BOT designation actually correspond the the B allele in the HapMap data. To correct for this, switchBOTgenotypes() switchs the A and B alleles in the input genotypes for all SNPs corresponding to BOT designations. This mean a homozygous genotype, 0, will be changed to a homozygous alternative genotype, 2, and vis versa. Heterozygous genotypes will be unchanged. NOTE: this function should only be implemented with Illumina SNPchip data when XIBD's HapMap reference data is used and if there is a noticeable discrepancy between population allele frequencies calculated from the HapMap reference data and those calculated from the input dataset.

Usage

1
switchBOTgenotypes(ped.genotypes, hapmap.topbot)

Arguments

ped.genotypes

a named list containing pedigree, genotypes and model. See Value description in getGenotypes for more details. The family IDs and individual IDs in pedigree must match the family IDs and individual IDs in the header of genotypes.

hapmap.topbot

a data frame containing the Illumina TOP/BOT designation for the HapMap SNPs. This file can be downloaded from http://bioinf.wehi.edu.au/software/XIBD/index.html. This file contains the following 7 columns of information:

  1. Chromosome ("numeric" or "integer")

  2. SNP identifier (type "character")

  3. Genetic map distance (centi morgans cM, or morgans M - default) (type "numeric")

  4. Base-pair position (type "numeric" or "integer")

  5. Illuminas TOP or BOT designation of the SNP (type "character")

where each row describes a single marker. The data frame should contain the header chr, snp_id, pos_bp, pos_M and TOPBOT.

Value

A named list of the same format as the input ped.genotypes with A and B alleles switched for BOT SNPs.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# The following should only be run if you have Illumina data and
# are using the HapMap reference data provided by XIBD.

# format and filter the data
my_genotypes <- getGenotypes(ped.map = example_pedmap,
                             reference.ped.map = example_reference_pedmap,
                             snp.ld = example_reference_ld,
                             model = 2,
                             maf = 0.01,
                             sample.max.missing = 0.1,
                             snp.max.missing = 0.1,
                             maximum.ld.r2 = 0.99,
                             chromosomes = NULL,
                             input.map.distance = "M",
                             reference.map.distance = "M")

# calculate allele frequencies from the input dataset
input_freq <- calculateAlleleFreq(ped.genotypes = my_genotypes)
hist(abs(my_genotypes[["genotypes"]][,"freq"] - input_freq[,"freq"]),
     xlim = c(0,1),
     main = "Before BOT change",
     xlab = "abs(pop allele freq diff)")

# switch alleles
my_genotypes_2 <- switchBOTgenotypes(ped.genotypes = my_genotypes,
                                     hapmap.topbot = example_hapmap_topbot)

# calculate allele frequencies when BOT alleles switched
input_freq <- calculateAlleleFreq(ped.genotypes = my_genotypes_2)
hist(abs(my_genotypes_2[["genotypes"]][,"freq"] - input_freq[,"freq"]),
     xlim = c(0,1),
     main = "After BOT change",
     xlab = "abs(pop allele freq diff)")

bahlolab/XIBD documentation built on May 11, 2019, 5:24 p.m.