genotypes: Constructor for a 'genotypes' object
In andrewparkermorgan/argyle: Basic interface and QC for genotypes from Illumina Infinium arrays

Description Usage Arguments Details Value The genotypes class The marker map The "pedigree" Filters

Constructor for a genotypes object

1
2
3

genotypes(G, map, ped = NULL, alleles = c("auto", "native", "01",
  "relative"), intensity = NULL, normalized = FALSE, filter.sites = NULL,
  filter.samples = NULL, check = TRUE, ...)

`G`	a genotype matrix with markers in rows and samples in columns, with both row and column names
`map`	a valid marker map (see Deatils) corresponding to `G`, with row names
`ped`	a valid "pedigree" (dataframe containing sample metadata)
`alleles`	character vector describing allele encoding (see `argyle` for details); `"auto"` lets the package try to guess the encoding
`intensity`	a list with elements `x` and `y` containing hybridization intensities; each is a matrix with same dimensions and same row/column names as `G`
`normalized`	logical; have intensities been normalized?
`filter.sites`	character vector of filters attached to markers
`filter.samples`	character vector of filters attached to samples
`check`	logical; if `TRUE`, do sanity checks on input
`...`	ignored

The input matrix G *must* have row and column names to help the package keep the marker map, sample metadata, and genotypes themselves in sync.

a new genotypes object

The `genotypes` class

This class is designed to be a lightweight container for genotype data on a set of samples typed for a panel of biallelic SNP markers on a microarray. The object inherits from base-R's class matrix, so any code which accepts a matrix (including the apply family) will work on a genotypes object.

Attributes of genotypes objects include:

map – marker metadata in PLINK format (chr, marker, cM, pos, A1, A2, ...)
ped – pedigree/sample metadata in PLINK format (individual ID, family ID, mom ID, dad ID, sex, phenotype, ...)
intensity – list(x = [X-intensities], y = [y-intensities])
normalized – have intensities been normalized?
baf – matrix of B-allele frequencies (BAFs; see tQN)
lrr – matrix of log2 intensity rations (LRRs; see tQN)
filter.sites – homage to the FILTER field in VCF format, a flag for suppresing sites (rows) in downstream analyses
filter.samples – same as above, but along other dimension (columns)
alleles – manner in which alleles are encoded: "native" (ACTGHN), "01" (allele dosage wrt ALT allele), "relative" (allele dosage wrt MINOR allele)

All attributes are maintained "parallel" to the genotypes matrix itself, and additionally have names to avoid ambiguity.

Note that missing values (NAs/NaNs) are used for no-calls, in order to take advantage of R's behaviors on missing data.

A valid marker map is a required attribute of a genotypes object. It is a dataframe with (at least) the following columns, in the following order. Columns followed by an asterisk (*) are optional but may be required for some downstream operations.

chr – (character, factor) chromosome identifier; use NA for missing
marker – (character, factor) *globally-unique* marker name, cannot be missing
cM – (numeric) genetic position of this marker in cM; use zero for missing
pos – (integer) position of this marker in basepairs; use zero for missing
A1* – (character, factor) REFERENCE allele, case-insensitive, cannot be missing
A2* – (character, factor) ALTERNATE allele, case-insensitive, cannot be missing

Rownames must be present and must match the contents of column "marker".

Although "pedigree" is used in homage to the nomenclature of the PLINK package, this attribute simply contains sample metadata even if true pedigrees are unknown. It is a dataframe with (at least) the following columns, the first 6 of which are for PLINK compatibility, in the following order.

fid – (character, factor) "family" ID (aka group ID); can indicate family, population, batch...
iid – (character, factor) *globally-unique* individual ID
mom – (character, factor) individual ID of this sample's mother; use zero for missing
dad – (character, factor) individual ID of this sample's father; use zero for missing
sex – (integer) 1=male, 2=female, 0=unknown/missing
pheno – (numeric) phenotype; 0/-9=missing, 1=control, 2=case, any other values allowed are taken to be a quantitative trait

Rownames must be present and must match the contents of column "iid". The pedigree is auto-generated when missing, and in that case every sample is assigned an "fid" identical to its "iid".

The filter.* fields are character vectors describing the filter(s), if any, with which to mark markers or samples. An empy string ("") indicates a "passing" marker or sample. Filters are appended to the filter string as single characters: H for excess heterozygosity; N for excess no-call rate; I (for sampes only) for abnormal intensity pattern; F (for markers only) for abberrant allele frequency.