genotype: Genotype or Haplotype Objects.
In genetics: Population Genetics

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/genotype.R

genotype creates a genotype object.

haplotype creates a haplotype object.

is.genotype returns TRUE if x is of class genotype

is.haplotype returns TRUE if x is of class haplotype

as.genotype attempts to coerce its argument into an object of class genotype.

as.genotype.allele.count converts allele counts (0,1,2) into genotype pairs ("A/A", "A/B", "B/B").

as.haplotype attempts to coerce its argument into an object of class haplotype.

nallele returns the number of alleles in an object of class genotype.

  genotype(a1, a2=NULL, alleles=NULL, sep="/", remove.spaces=TRUE,
           reorder = c("yes", "no", "default", "ascii", "freq"),
           allow.partial.missing=FALSE, locus=NULL,
           genotypeOrder=NULL)

  haplotype(a1, a2=NULL, alleles=NULL, sep="/", remove.spaces=TRUE,
            reorder="no", allow.partial.missing=FALSE, locus=NULL,
            genotypeOrder=NULL)

  is.genotype(x)

  is.haplotype(x)

  as.genotype(x, ...)

  ## S3 method for class 'allele.count'
as.genotype(x, alleles=c("A","B"), ... )

  as.haplotype(x, ...)

  ## S3 method for class 'genotype'
print(x, ...)

  nallele(x)

`x`	either an object of class `genotype` or `haplotype` or an object to be converted to class `genotype` or `haplotype`.
`a1,a2`	vector(s) or matrix containing two alleles for each individual. See details, below.
`alleles`	names (and order if `reorder="yes"`) of possible alleles.
`sep`	character separator or column number used to divide alleles when `a1` is a vector of strings where each string holds both alleles. See below for details.
`remove.spaces`	logical indicating whether spaces and tabs will be removed from a1 and a2 before processing.
`reorder`	how should alleles within an individual be reordered. If `reorder="no"`, use the order specified by the alleles parameter. If `reorder="freq"` or `reorder="yes"`, sort alleles within each individual by observed frequency. If `reorder="ascii"`, reorder alleles in ASCII order (alphabetical, with all upper case before lower case). The default value for `genotype` is `"freq"`. The default value for `haplotype` is `"no"`.
`allow.partial.missing`	logical indicating whether one allele is permitted to be missing. When set to `FALSE` both alleles are set to `NA` when either is missing.
`locus`	object of class locus, gene, or marker, holding information about the source of this genotype.
`genotypeOrder`	character, vector of genotype/haplotype names so that further functions can sort genotypes/haplotypes in wanted order
`...`	optional arguments

Genotype objects hold information on which gene or marker alleles were observed for different individuals. For each individual, two alleles are recorded.

The genotype class considers the stored alleles to be unordered, i.e., "C/T" is equivalent to "T/C". The haplotype class considers the order of the alleles to be significant so that "C/T" is distinct from "T/C".

When calling genotype or haplotype:

If only a1 is provided and is a character vector, it is assumed that each element encodes both alleles. In this case, if sep is a character string, a1 is assumed to be coded as "Allele1<sep>Allele2". If sep is a numeric value, it is assumed that character locations 1:sep contain allele 1 and that remaining locations contain allele 2.
If a1 is a matrix, it is assumed that column 1 contains allele 1 and column 2 contains allele 2.
If a1 and a2 are both provided, each is assumed to contain one allele value so that the genotype for an individual is obtained by paste(a1,a2,sep="/").

If remove.spaces is TRUE, (the default) any whitespace contained in a1 and a2 is removed when the genotypes are created. If whitespace is used as the separator, (eg "C C", "C T", ...), be sure to set remove.spaces to FALSE.

When the alleles are explicitly specified using the alleles argument, all potential alleles not present in the list will be converted to NA.

NOTE: genotype assumes that the order of the alleles is not important (E.G., "A/C" == "C/A"). Use class haplotype if order is significant.

If genotypeOrder=NULL (the default setting), then expectedGenotypes is used to get standard sorting order. Only unique values in genotypeOrder are used, which in turns means that the first occurrence prevails. When genotypeOrder is given some genotype names, but not all that appear in the data, the rest (those in the data and possible combinations based on allele variants) is automatically added at the end of genotypeOrder. This puts "missing" genotype names at the end of sort order. This feature is especially useful when there are a lot of allele variants and especially in haplotypes. See examples.

The genotype class extends "factor" and haplotype extends genotype. Both classes have the following attributes:

`levels`	character vector of possible genotype/haplotype values stored coded by `paste( allele1, "/", allele2, sep="")`.
`allele.names`	character vector of possible alleles. For a SNP, these might be c("A","T"). For a variable length dinucleotyde repeat this might be c("136","138","140","148").
`allele.map`	matrix encoding how the factor levels correspond to alleles. See the source code to `allele.genotype()` for how to extract allele values using this matrix. Better yet, just use `allele.genotype()`.
`genotypeOrder`	character, genotype/haplotype names in defined order that can used for sorting in various functions. Note that this slot stores both ordered and unordered genotypes i.e. "A/B" and "B/A".

Gregory R. Warnes greg@warnes.net and Friedrich Leisch.

HWE.test, allele, homozygote, heterozygote, carrier, summary.genotype, allele.count, sort.genotype, genotypeOrder, locus, gene, marker, and %in% for default %in% method

# several examples of genotype data in different formats
example.data   <- c("D/D","D/I","D/D","I/I","D/D",
                    "D/D","D/D","D/D","I/I","")
g1  <- genotype(example.data)
g1

example.data2  <- c("C-C","C-T","C-C","T-T","C-C",
                    "C-C","C-C","C-C","T-T","")
g2  <- genotype(example.data2,sep="-")
g2


example.nosep  <- c("DD", "DI", "DD", "II", "DD",
                    "DD", "DD", "DD", "II", "")
g3  <- genotype(example.nosep,sep="")
g3

example.a1 <- c("D",  "D",  "D",  "I",  "D",  "D",  "D",  "D",  "I",  "")
example.a2 <- c("D",  "I",  "D",  "I",  "D",  "D",  "D",  "D",  "I",  "")
g4  <- genotype(example.a1,example.a2)
g4

example.mat <- cbind(a1=example.a1, a1=example.a2)
g5  <- genotype(example.mat)
g5

example.data5  <- c("D   /   D","D   /   I","D   /   D","I   /   I",
                    "D   /   D","D   /   D","D   /   D","D   /   D",
                    "I   /   I","")
g5  <- genotype(example.data5,rem=TRUE)
g5

# show how genotype and haplotype differ
data1 <- c("C/C", "C/T", "T/C")
data2 <- c("C/C", "T/C", "T/C")

test1  <- genotype( data1 )
test2  <- genotype( data2 )

test3  <-  haplotype( data1 )
test4  <-  haplotype( data2 )

test1==test2
test3==test4

test1=="C/T"
test1=="T/C"

test3=="C/T"
test3=="T/C"

## also
test1 
test1 
test3 

test1 
test1 

test3 
test3 

## "Messy" example

m3  <-  c("D D/\t   D D","D\tD/   I",  "D D/   D D","I/   I",
          "D D/   D D","D D/   D D","D D/   D D","D D/   D D",
          "I/   I","/   ","/I")

genotype(m3)
summary(genotype(m3))

m4  <-  c("D D","D I","D D","I I",
          "D D","D D","D D","D D",
          "I I","   ","  I")

genotype(m4,sep=1)
genotype(m4,sep=" ",remove.spaces=FALSE)
summary(genotype(m4,sep=" ",remove.spaces=FALSE))

m5  <-  c("DD","DI","DD","II",
          "DD","DD","DD","DD",
          "II","   "," I")
genotype(m5,sep=1)
haplotype(m5,sep=1,remove.spaces=FALSE)

g5  <- genotype(m5,sep="")
h5  <- haplotype(m5,sep="")

heterozygote(g5)
homozygote(g5)
carrier(g5,"D")

g5[9:10]  <- haplotype(m4,sep=" ",remove=FALSE)[1:2]
g5

g5[9:10]
allele(g5[9:10],1)
allele(g5,1)[9:10]

# drop unused alleles
g5[9:10,drop=TRUE]
h5[9:10,drop=TRUE]

# Convert allele.counts into genotype

x <- c(0,1,2,1,1,2,NA,1,2,1,2,2,2)
g <- as.genotype.allele.count(x, alleles=c("C","T") )
g

# Use of genotypeOrder
example.data   <- c("D/D","D/I","I/D","I/I","D/D",
                    "D/D","D/I","I/D","I/I","")
summary(genotype(example.data))
genotypeOrder(genotype(example.data))

summary(genotype(example.data, genotypeOrder=c("D/D", "I/I", "D/I")))
summary(genotype(example.data, genotypeOrder=c(              "D/I")))
summary(haplotype(example.data, genotypeOrder=c(             "I/D", "D/I")))
example.data <- genotype(example.data)
genotypeOrder(example.data) <- c("D/D", "I/I", "D/I")
genotypeOrder(example.data)