MCIC: Modified Complex Indel Coding as distance matrix

Description Usage Arguments Details Value Author(s) References Examples

View source: R/MCIC.R

Description

This function computes an indel distance matrix following the rationale of the Modified Complex Indel Coding (Muller, 2006) to estimate transition matrices.

Usage

1
2
MCIC(inputFile = NA, align = NA, saveFile = TRUE, outname =
paste(inputFile, "IndelDistanceMatrixMullerMod.txt"), silent = FALSE)

Arguments

inputFile

the name of the fasta file to be analysed. Alternatively you can provide the name of a "DNAbin" class alignment stored in memory using the "align" option.

align

the name of the alignment to be analysed. See "read.dna" in ape package for details about reading alignments. Alternatively you can provide the name of the file containing the alignment in fasta format using the "inputFile" option.

saveFile

a logical; if TRUE (default), function output is saved as a text file.

outname

if "saveFile" is set to TRUE (default), contains the name of the output file.

silent

a logical; if FALSE (default), it prints the number of unique sequences found and the name of the output file.

Details

It is recommended to estimate this distance matrix using only the unique sequences in the alignment. Repeated sequences increase computation time but do not provide additional information (because they produce duplicated rows and columns in the final distance matrix).

Value

A matrix containing the genetic distances estimated as indels pairwise differences.

Author(s)

A. J. Muñoz-Pajares

References

Muller K. (2006). Incorporating information from length-mutational events into phylogenetic analysis. Molecular Phylogenetics and Evolution, 38, 667-676.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
# # This will generate an example file in your working directory:
# cat(">Population1_sequence1",
# "A-AGGGTC-CT---G",
# ">Population1_sequence2",
# "TAA---TCGCT---G",
# ">Population1_sequence3",
# "TAAGGGTCGCT---G",
# ">Population1_sequence4",
# "TAA---TCGCT---G",
# ">Population2_sequence1",
# "TTACGGTCG---TTG",
# ">Population2_sequence2",
# "TAA---TCG---TTG",
# ">Population2_sequence3",
# "TAA---TCGCTATTG",
# ">Population2_sequence4",
# "TTACGGTCG---TTG",
# ">Population3_sequence1",
# "TTA---TCG---TAG",
# ">Population3_sequence2",
# "TTA---TCG---TAG",
# ">Population3_sequence3",
# "TTA---TCG---TAG",
# ">Population3_sequence4",
# "TTA---TCG---TAG",
#      file = "ex3.fas", sep = "\n")
# 
# # Reading the alignment directly from file and saving no output file:
# MCIC (input="ex3.fas", saveFile = FALSE)
# 
# # Analysing the same dataset, but using only unique sequences:
# uni<-GetHaplo(inputFile="ex3.fas",saveFile=FALSE)
# MCIC (align=uni, saveFile = FALSE)
# 

sidier documentation built on June 25, 2021, 5:10 p.m.