catmultGenes: Compares gene datasets for combined phylogenetic analysis...

View source: R/catmultGenes.R

catmultGenesR Documentation

Compares gene datasets for combined phylogenetic analysis when species are duplicated or represented by multiple accessions in one DNA alignment

Description

Compares a list of "n" gene datasets (individual DNA alignments) and makes them with the same number of taxa, ready for combined, multigene phylogenetic analysis. This function is best designed for concatenating DNA alignments where species have duplicated sequences (multiple accessions) from different collections. Then, make sure the species are labeled with both the scientific name and the same identifying number throughout each DNA alignment. Identifying numbers could be collector surname and associated collection number, or an accession number for the isolated DNA from which each gene was sequenced. During the comparison across DNA alignments for concatenation, the function will consider that any species is represented by multiple sequences and so in each individual gene dataset species will fully matched if they have exact scientific name and associated identifying number.

Usage

catmultGenes(...,
             maxspp = TRUE,
             shortaxlabel = TRUE,
             missdata = TRUE,
             outgroup = NULL)

Arguments

...

a list of NEXUS-formatted gene datasets as read by ape's read.nexus.data or at least two individually ape-read objects of NEXUS-formatted gene datasets.

maxspp

Logical, if FALSE any species never duplicated with multiple accessions in any indvidual DNA alignment might end either duplicated or deleted, depending on the chosen missdata argument. We recomend to maxspp = TRUE so as to maximize the taxon coverage. This means that if the species is not duplicated in any individual dataset, it will always be kept in the final concatenated dataset no matter each sequence for that species across the individual dataset were generated from distinct collections or accessions.

shortaxlabel

Logical, if FALSE the final individual gene dataset will maintain the accession numbers associated with each species or sequence.

missdata

Logical, if FALSE the comparison will exclude any species that lacks a complete sequence for one of the input gene dataset.

outgroup

Provide the outgroup taxa (either one taxon name or a vector of multiple taxon names that are present in all individual gene dataset) if the concatenation is intended to maintain incomplete taxa (taxa missing the sequence for a particular gene).

Value

A list of dataframes of the equally-sized gene dataset, where the first column "species" include all taxon names and the second column "sequence" include the DNA sequence for the corresponding taxon.

Author(s)

Domingos Cardoso and Quezia Cavalcante

See Also

writeNexus

writePhylip

dropSeq

nexusdframe

phylipdframe

fastadframe

Examples

## Not run: 
data(Luetzelburgia)
catdf <- catmultGenes(Luetzelburgia,
                      maxspp = TRUE,
                      shortaxlabel = TRUE,
                      missdata = TRUE)

outgrouptaxa <- c("Vataireopsis_araroba", "Vataireopsis_speciosa")
catdf <- catmultGenes(Luetzelburgia,
                      maxspp = FALSE,
                      shortaxlabel = TRUE,
                      missdata = FALSE,
                      outgroup = outgrouptaxa)

## End(Not run)


domingoscardoso/catGenes documentation built on March 14, 2024, 9:21 p.m.