align_many_genes_aa: Alignment of many genes (amino acid sequences)

View source: R/align_many_genes_aa.R

align_many_genes_aaR Documentation

Alignment of many genes (amino acid sequences)

Description

Reads in multiple FASTA files of different genes, performs an alignment, and tests the best evolutionary model. Specifically for amino acid sequences.

Usage

align_many_genes_aa(
  fasta.files,
  method = "ClustalW",
  model.model = NULL,
  model.G = TRUE,
  model.I = TRUE,
  cores = 1,
  gene.names = NULL
)

Arguments

fasta.files

Character, vector of paths to FASTA files. Each one containing sequences for a specific gene for multiple taxa.

method

Character, one of "ClustalW", "ClustalOmega", "Muscle". Is passed to the function msa::msa(... method=method).

model.model

Character, evolutionary models to test. Any one, or all of the possible model arguments in phangorn::modelTest. Passed to function phangorn::modelTest(... model=model.model). Default is NULL, in which case, no models will be tested.

model.G

Logical, whether a gamma distribution of rates should be tested in evolutionary models. Passed to function phangorn::modelTest(... G=model.G). Default is TRUE.

model.I

Logical, whether invariant rates should be tested in evolutionary models. Passed to function phangorn::modelTest(... I=model.I). Default is TRUE.

cores

Integer, number of parallel jobs to run. Default is 1.

gene.names

Character, an optional vector of gene names that match the FASTA files listed in fasta.files. Used to provide name handles for each gene in the final output list. Default is NULL.

Details

The top evolutionary models per gene are identified using AIC criteria. First, the model with the smallest AIC value is identified, then the delta AIC is calculated for all models as focal model AIC minus the AIC of the best model. Those models with delta AIC <= 10 are cosnidered the top models and non-differentiable.

Value

Returns a list. Each index is one of the genes from the vector of FASTA files specified in fasta.files. For each indexed gene, there are additional slots:

  1. $gene = AAStringSet, the imported FASTA sequences.

  2. $align = msa, the multiple sequence alignment for the gene.

  3. $model.test = Data.table, the output of phangorn::modelTest.

  4. $model.top = Data.table, the top evolutionary models.

Examples

library(genomalicious)

# Create a link to raw external datasets in genomalicious
genomaliciousExtData <- paste0(find.package('genomalicious'), '/extdata')

# Path to the demo FASTA file
fasta <- paste0(genomaliciousExtData, '/data_COI_aa.fasta')

# Multi sequence alignmnet of demo COI data, without model test
aln <- align_many_genes_aa(fasta, gene.names='COI')

str(aln)
names(aln)
aln$COI

# Same again, but with model test
aln.test <- align_many_genes_aa(
   fasta.files=fasta, gene.names='COI', model.I=TRUE, model.G=TRUE,
   model.model='all'
   )

aln.test$COI$model.test
aln.test$COI$model.top



j-a-thia/genomalicious documentation built on Oct. 19, 2024, 7:51 p.m.