fasta2pim: Align a fasta file with MUSCLE and calculate the percentage...

View source: R/fasta_utils.R

fasta2pimR Documentation

Align a fasta file with MUSCLE and calculate the percentage of sequence identity of the aligned sequence

Description

This function first aligns a DNA fasta file with multiple sequences using MUSCLE using msa::msaMuscle(), where you can pass additional parameters thanks to .... Then the pairwise sequence indentity is calculated using seqinr::dist.alignment(). Gaps in the alignment will be counted in the identity measure. The output matrix is always ordered as the input fasta sequences.

Usage

fasta2pim(input_path, percentage_identity = TRUE, ...)

Arguments

input_path

A path to a DNA fasta file for which you want the pairwise sequence identity

percentage_identity

Logical, return the percentage of sequence identity. Default TRUE. If FALSE the squared root of the pairwise identity is returned. See details.

...

Other parameters passed to msa::msaMuscle().

Details

By default this function returns the percentage of sequence identity from 0 to 100. By setting percentage_identity = FALSE, the sqrt(1 - identity) is returned. So if the identity between 2 sequences is 19% the squared root of (1.0 - 0.19) i.e. 0.9.

Value

A matrix with aligned DNA sequence identity

Note

If you want to inspect the multiple sequence alignment used to calculate the percentage of sequence identity you can write it to a fasta file with align_DNA_fasta() using the same MUSCLE parameters.

Examples

pim <- fasta2pim(input_path = dna_fasta_path)

Ni-Ar/niar documentation built on Feb. 3, 2025, 9:25 a.m.