supermatrix: Retrieval and Concatentation of Multiple Sequence Alignments

Description Usage Arguments Value

View source: R/supermatrix.R

Description

Concatenates multiple sequence alignments of individual loci into a supermatrix.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
supermatrix(
  megProj,
  min.n.seq = 3,
  blocks = "split",
  row.confid = 0,
  col.confid = 0,
  partition,
  trim.ends = 0,
  locus.coverage = 0,
  global.coverage = 0,
  subset.locus,
  subset.species,
  exclude.locus,
  exclude.species,
  core.locus,
  core.species,
  best.sampled.congeneric = FALSE,
  protect.outgroup = FALSE,
  squeeze.outgroup
)

Arguments

megProj

An object of class megapteraProj.

min.n.seq

Numeric, the minimum number of sequences in any alignment that is required to be included in the supermatrix.

blocks

A character string indicating how to handle alignment blocks: "split" causes blocks to be returned as elements of a list. "concatenate" means, blocks will be concatendated and returned as a single alignment.

row.confid

A real number in the interval [0, 1] giving the confidence threshold for alignment rows (i.e. taxa, sequences); only rows scoring equal or greater to min.confid will be selected.

col.confid

A real number in the interval [0, 1] giving the confidence threshold for alignment columns (i.e. nucletide positions); only rows (i.e. taxa, sequences) or columns scoring equal or greater to min.confid will be selected.

partition

A named list, each element of which contains the names of the loci to group together in one partition. The default is to assign each locus to its own partition. Partitioning of first/second and third nucleotides positions is not yet possible, but might be implemented in the future.

trim.ends

A numeric giving the required minimum number of sequences having an non-ambiguous base character (a, c, g, t) in the first and last position of the alignment; defaults to 0, which means no trimming. topology. Can also be given as a fraction.

locus.coverage

Numeric between 0 and 1 giving the required minimum coverage of any species in any alignment.

global.coverage

Numeric between 0 and 1 giving the required minimum coverage of any species the concatenated alignment.

subset.locus

A vector of mode "character" for choosing a subset of loci from the loci available.

subset.species

A vector of mode "character" for choosing a subset of the species from the total species available.

exclude.locus

A vector of mode "character" giving the names of the loci in the database that will be excluded from the concatenation.

exclude.species

Currently unused.

core.locus

A vector of mode "character" giving the names of the 'core' loci: The resulting supermatrix will only contain species that are contained in the 'core' loci. This option is intended to create denser supermatrices.

core.species

Currently unused.

best.sampled.congeneric

Logical, keep all but the best-sampled species in every genus.

protect.outgroup

Logical, if TRUE, the effects of argument core.locus and global.coverage on outgroup taxa will be ignored.

squeeze.outgroup

Numeric, can be given to reduce the number of outgroup species: The function will select the squeeze.outgroup outgroup species with the best coverage. The idea is to create a more densely sampled outgroup.

Value

a list with four elements:

zipname

a character string giving the name of the files produced.

supermatrix

a matrix of class DNAbin.

outgroup

a vector of mode "character" giving the names of the species used as an outgroup.

partitions

a vector of mode "character" giving the partitions of the supermatrix in the same spelling as accepted by RAxML.

In addition, three files are written to the working directory: (1) a NEXUS-formatted file and (2) a PHYLIP-formatted file containing the sequence matrix and (3) a zipped directory containing the PHYLIP-formatted sequence matrix plus output and partitions as separate ASCII-formatted files.


heibl/megaptera documentation built on Jan. 17, 2021, 3:34 a.m.