argyle: argyle: An 'R' package for import and QC of genotypes from...

Description The genotypes object Accessing metadata Notes on allele encoding

Description

argyle: An R package for import and QC of genotypes from Illumina Infinium arrays

The genotypes object

The genotypes class is just a matrix (sites x samples) with row and column names, and a dataframe (in attr(,"map")) describing marker positions. Nearly all the functions in this package expect a genotypes object as input. For details, see genotypes.

Accessing metadata

The `$` operator is overloaded for the genotypes class, so that writing attr(g, "normalized") is equivalent to writing g$normalized. But this works only in one direction: g$normalized <- FALSE fails. Nefarious users can modify attributes directly using the standard and somewhat convoluted syntax attr(,"x") <- y but do so at their own risk. For safety, always check that the resulting object remains valid (all internal parts having matching dimensions and names) with a call to validate(g).

Accessor functions are provided for the marker map (markers(g)), sample metadata (samples(g)), and intensity matrices (intensity(g)).

Notes on allele encoding

For the purposes of this package, all markers on an array are treated as biallelic SNPs, and all samples are assumed to be diploid for the autosomes. Genotype calls are reported by Illumina BeadStudio as a two-character vector of nucleotides: eg. AA, AG, GG for an [A/G] SNP. The - character indicates a missing call ("no-call"). On import, these calls are summarized to a single character, one of ACGTHN (H = heterozygous, N = no-call).

For most analyses a numeric representation of genotypes is desirable. The function recode.genotypes() performs this conversion. When reference alleles are provided in columns "A1" (REF) and "A2" (ALT) in the marker map, genotypes are recoded 0 (homozygous REF), 1 (heterozygous), 2 (homozygous ALT) or NA (missing).

Recoding can also be "relative": that is, performed with respect to the major and minor allele as defined by the dataset itself. In this case the recoded 0 (homozygous major allele), 1 (heterozygous), 2 (homozygous minor allele) or NA (missing).

The attribute alleles tracks the current allele encoding.


andrewparkermorgan/argyle documentation built on May 10, 2019, 11:08 a.m.