Description Usage Arguments Details Value The genotypes class The marker map The "pedigree" Filters
Constructor for a genotypes
object
1 2 3 |
G |
a genotype matrix with markers in rows and samples in columns, with both row and column names |
map |
a valid marker map (see Deatils) corresponding to |
ped |
a valid "pedigree" (dataframe containing sample metadata) |
alleles |
character vector describing allele encoding (see |
intensity |
a list with elements |
normalized |
logical; have intensities been normalized? |
filter.sites |
character vector of filters attached to markers |
filter.samples |
character vector of filters attached to samples |
check |
logical; if |
... |
ignored |
The input matrix G
*must* have row and column names to help the package keep the marker
map, sample metadata, and genotypes themselves in sync.
a new genotypes
object
genotypes
classThis class is designed to be a lightweight container for genotype data on a set of samples typed for a
panel of biallelic SNP markers on a microarray. The object inherits from base-R
's class matrix
,
so any code which accepts a matrix (including the apply
family) will work on a genotypes
object.
Attributes of genotypes
objects include:
map
– marker metadata in PLINK format (chr, marker, cM, pos, A1, A2, ...)
ped
– pedigree/sample metadata in PLINK format (individual ID, family ID,
mom ID, dad ID, sex, phenotype, ...)
intensity
– list
(x
= [X-intensities], y
= [y-intensities])
normalized
– have intensities been normalized?
baf
– matrix of B-allele frequencies (BAFs; see tQN
)
lrr
– matrix of log2 intensity rations (LRRs; see tQN
)
filter.sites
– homage to the FILTER field in VCF format, a flag for suppresing
sites (rows) in downstream analyses
filter.samples
– same as above, but along other dimension (columns)
alleles
– manner in which alleles are encoded: "native" (ACTGHN),
"01" (allele dosage wrt ALT allele), "relative" (allele dosage wrt MINOR allele)
All attributes are maintained "parallel" to the genotypes matrix itself, and additionally have names to avoid ambiguity.
Note that missing values (NAs/NaNs) are used for no-calls, in order to take advantage of R's behaviors on missing data.
A valid marker map is a required attribute of a genotypes
object. It is a dataframe with (at least)
the following columns, in the following order. Columns followed by an asterisk (*) are optional but may be
required for some downstream operations.
chr
– (character, factor) chromosome identifier; use NA
for missing
marker
– (character, factor) *globally-unique* marker name, cannot be missing
cM
– (numeric) genetic position of this marker in cM; use zero for missing
pos
– (integer) position of this marker in basepairs; use zero for missing
A1
* – (character, factor) REFERENCE allele, case-insensitive, cannot be missing
A2
* – (character, factor) ALTERNATE allele, case-insensitive, cannot be missing
Rownames must be present and must match the contents of column "marker".
Although "pedigree" is used in homage to the nomenclature of the PLINK package, this attribute simply contains sample metadata even if true pedigrees are unknown. It is a dataframe with (at least) the following columns, the first 6 of which are for PLINK compatibility, in the following order.
fid
– (character, factor) "family" ID (aka group ID); can indicate family, population, batch...
iid
– (character, factor) *globally-unique* individual ID
mom
– (character, factor) individual ID of this sample's mother; use zero for missing
dad
– (character, factor) individual ID of this sample's father; use zero for missing
sex
– (integer) 1=male, 2=female, 0=unknown/missing
pheno
– (numeric) phenotype; 0/-9=missing, 1=control, 2=case, any other values allowed
are taken to be a quantitative trait
Rownames must be present and must match the contents of column "iid". The pedigree is auto-generated when missing, and in that case every sample is assigned an "fid" identical to its "iid".
The filter.*
fields are character vectors describing the filter(s), if any, with which to mark markers
or samples. An empy string (""
) indicates a "passing" marker or sample. Filters are appended to the
filter string as single characters: H
for excess heterozygosity; N
for excess no-call rate;
I
(for sampes only) for abnormal intensity pattern; F
(for markers only) for abberrant allele frequency.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.