alignCodingSequencesPipeline: Runs the pipeline to align a set of coding sequences: First...

View source: R/phylogenies.R

alignCodingSequencesPipelineR Documentation

Runs the pipeline to align a set of coding sequences: First translates them, then validates them for premature stop codons, subsequently generates a multiple sequence alignment (MSA) of amino acid (AA) sequences, then uses this AA MSA as guide and aligns the coding sequences in the final step.

Description

Runs the pipeline to align a set of coding sequences: First translates them, then validates them for premature stop codons, subsequently generates a multiple sequence alignment (MSA) of amino acid (AA) sequences, then uses this AA MSA as guide and aligns the coding sequences in the final step.

Usage

alignCodingSequencesPipeline(cds, work.dir, gene.group.name)

Arguments

cds

an instance of base::list as generated by seqinr::read.fasta representing the coding sequences that need to be aligned

work.dir

the working directory to use and in which to save the relevant files

gene.group.name

a string being used to name the output files written into work.dir. Could be something like 'fam1234'.

Value

The ALIGNED and validated coding sequences as an instance of base::list as generated by seqinr::read.fasta, or nothing if validation discards the rest of 'cds'. Sanitize the gene identifiers: Convert to AA and align the AA-sequences: Remove invalid AA-Sequences, i.e. AA-Seqs with premature stop-codons: Warn about removed AA-Seqs: If only a single sequence is left, we're done: Write out the sanitized amino acid seqs: Generate a multiple sequence alignment: Use the aligned AA-Seqs as quide to align the CDS Sequences: Return the CDS MSA using the ORIGINAL gene identifiers:


asishallab/GeneFamilies documentation built on July 28, 2024, 11:44 a.m.