haplotype: Haplotype Extraction and Frequencies

View source: R/haplotype.R

haplotypeR Documentation

Haplotype Extraction and Frequencies

Description

haplotype extracts the haplotypes from a set of DNA sequences. The result can be plotted with the appropriate function.

Usage

haplotype(x, ...)
## S3 method for class 'DNAbin'
haplotype(x, labels = NULL, strict = FALSE,
                  trailingGapsAsN = TRUE, ...)
## S3 method for class 'character'
haplotype(x, labels = NULL, ...)
## S3 method for class 'numeric'
haplotype(x, labels = NULL, ...)
## S3 method for class 'haplotype'
plot(x, xlab = "Haplotype", ylab = "Number", ...)
## S3 method for class 'haplotype'
print(x, ...)
## S3 method for class 'haplotype'
summary(object, ...)
## S3 method for class 'haplotype'
sort(x,
     decreasing = ifelse(what == "frequencies", TRUE, FALSE),
     what = "frequencies", ...)
## S3 method for class 'haplotype'
x[...]

Arguments

x

a set of DNA sequences (as an object of class "DNAbin"), or an object of class "haplotype".

object

an object of class "haplotype".

labels

a vector of character strings used as names for the rows of the returned object. By default, Roman numerals are given.

strict

a logical value; if TRUE, ambiguities and gaps in the sequences are ignored and treated as separate characters.

trailingGapsAsN

a logical value; if TRUE (the default), the leading and trailing alignment gaps are considered as unknown bases (i.e., N). This option has no effect if strict = TRUE.

xlab, ylab

labels for the x- and x-axes.

...

further arguments passed to barplot (unused in print and sort).

decreasing

a logical value specifying in which order to sort the haplotypes; by default this depends on the value of what.

what

a character specifying on what feature the haplotypes should be sorted: this must be "frequencies" or "labels", or an unambiguous abbreviation of these.

Details

The way ambiguities in the sequences are taken into account is explained in a post to r-sig-phylo (see the examples below):

https://www.mail-archive.com/r-sig-phylo@r-project.org/msg05541.html

The sort method sorts the haplotypes in decreasing frequencies (the default) or in alphabetical order of their labels (if what = "labels"). Note that if these labels are Roman numerals (as assigned by haplotype), their alphabetical order may not be their numerical one (e.g., IX is alphabetically before VIII).

From pegas 0.7, haplotype extracts haplotypes taking into account base ambiguities (see Note below).

Value

haplotype returns an object of class c("haplotype", "DNAbin") which is an object of class "DNAbin" with two additional attributes: "index" identifying the index of each observation that share the same haplotype, and "from" giving the name of the original data.

sort returns an object of the same class respecting its attributes.

Note

The presence of ambiguous bases and/or alignment gaps in DNA sequences can make the interpretation of haplotypes difficult. It is recommended to check their distributions with image.DNAbin and base.freq (using the options in both functions).

Comparing the results obtained playing with the options strict and trailingGapsAsN of haplotype.DNAbin may be useful. Note that the ape function seg.sites has the same two options (as from ape 5.4) which may be useful to find the relevant sites in the sequence alignment.

Note

There are cases where the algorithm that pools the different sequences into haplotypes has difficulties, although it seems to require a specific configuration of missing/ambiguous data. The last example below is one of them.

Author(s)

Emmanuel Paradis

See Also

haploNet, haploFreq, subset.haplotype, DNAbin for manipulation of DNA sequences in R.

The haplotype method for objects of class "loci" is documented separately: haplotype.loci.

Examples

## generate some artificial data from 'woodmouse':
data(woodmouse)
x <- woodmouse[sample(15, size = 110, replace = TRUE), ]
(h <- haplotype(x))
## the indices of the individuals belonging to the 1st haplotype:
attr(h, "index")[[1]]
plot(sort(h))
## get the frequencies in a named vector:
setNames(lengths(attr(h, "index")), labels(h))

## data posted by Hirra Farooq on r-sig-phylo (see link above):
cat(">[A]\nCCCGATTTTATATCAACATTTATTT------",
    ">[D]\nCCCGATTTT----------------------",
    ">[B]\nCCCGATTTTATATCAACATTTATTT------",
    ">[C]\nCCCGATTTTATATCACCATTTATTTTGATTT",
    file = "x.fas", sep = "\n")
x <- read.dna("x.fas", "f")
unlink("x.fas")

## show the sequences and the distances:
alview(x)
dist.dna(x, "N", p = TRUE)

## by default there are 3 haplotypes with a warning about ambiguity:
haplotype(x)

## the same 3 haplotypes without warning:
haplotype(x, strict = TRUE)

## if we remove the last sequence there is, by default, a single haplotype:
haplotype(x[-4, ])

## to get two haplotypes separately as with the complete data:
haplotype(x[-4, ], strict = TRUE)

## a simpler example:
y <- as.DNAbin(matrix(c("A", "A", "A", "A", "R", "-"), 3))
haplotype(y) # 1 haplotype
haplotype(y, strict = TRUE) # 3 haplotypes
haplotype(y, trailingGapsAsN = FALSE) # 2 haplotypes

## a tricky example with 4 sequences and 1 site:
z <- as.DNAbin(matrix(c("Y", "A", "R", "N"), 4))
alview(z, showpos = FALSE)

## a single haplotype is identified:
haplotype(z)
## 'Y' has zero-distance with (and only with) 'N', so they are pooled
## together; at a later iteration of this pooling step, 'N' has
## zero-distance with 'R' (and ultimately with 'A') so they are pooled

## if the sequences are ordered differently, 'Y' and 'A' are separated:
haplotype(z[c(4, 1:3), ])

pegas documentation built on March 7, 2023, 7:21 p.m.