labelHaplotypes: Find and label haplotypes
In EricArcher/strataG: Summaries and Population Structure Analyses of Genetic Data

labelHaplotypes

R Documentation

Find and label haplotypes

Description

Identify and group sequences that share the same haplotype.

Usage

labelHaplotypes(x, prefix = NULL, use.indels = TRUE)

## Default S3 method:
labelHaplotypes(x, prefix = NULL, use.indels = TRUE)

## S3 method for class 'list'
labelHaplotypes(x, ...)

## S3 method for class 'character'
labelHaplotypes(x, ...)

## S3 method for class 'gtypes'
labelHaplotypes(x, ...)

Arguments

`x`	sequences in a `character matrix`, `list`, or `DNAbin` object, or a haploid gtypes object with sequences.
`prefix`	a character string giving prefix to be applied to numbered haplotypes. If NULL, haplotypes will be labeled with the first label from original sequences.
`use.indels`	logical. Use indels when comparing sequences?
`...`	arguments to be passed to `labelHaplotypes.default`.

Details

If any sequences contain ambiguous bases (N's) they are first removed. Then haplotypes are assigned based on the remaining sequences. The sequences with N's that were removed are then assigned to the new haplotypes if it can be done unambiguously (they match only one haplotype with 0 differences once the N's have been removed). If this can't be done they are assigned NAs and listed in the unassigned element.

Value

For character, list, or DNAbin, a list with the following elements:

haps: named vector (DNAbin) or list of named vectors (multidna) of haplotypes for each sequence in x.
hap.seqs: DNAbin or multidna object containing sequences for each haplotype.
unassigned: data.frame listing closest matching haplotypes for unassignable sequences with N's and the minimum number of substitutions between the two. Will be NULL if no sequences remain unassigned.

For gtypes, a new gtypes object with unassigned individuals stored in the @other slot in an element named 'haps.unassigned' (see getOther).

Author(s)

Eric Archer eric.archer@noaa.gov

Examples

# create 5 example short haplotypes
haps <- c(
  H1 = "ggctagct",
  H2 = "agttagct",
  H3 = "agctggct",
  H4 = "agctggct",
  H5 = "ggttagct"
)
# draw and label 100 samples
sample.seqs <- sample(names(haps), 100, rep = TRUE)
ids <- paste(sample.seqs, 1:length(sample.seqs), sep = "_")
sample.seqs <- lapply(sample.seqs, function(x) strsplit(haps[x], "")[[1]])
names(sample.seqs) <- ids

# add 1-2 random ambiguities
with.error <- sample(1:length(sample.seqs), 10)
for(i in with.error) {
  num.errors <- sample(1:2, 1)
  sites <- sample(1:length(sample.seqs[[i]]), num.errors)
  sample.seqs[[i]][sites] <- "n"
}

hap.assign <- labelHaplotypes(sample.seqs, prefix = "Hap.")
hap.assign

EricArcher/strataG documentation built on June 8, 2025, 2:12 a.m.