MEDMat: Minimum Entropy Decomposition

Description Usage Arguments Value Note Author(s) References Examples

Description

Decomposes a set of aligned FASTA sequences until either the minimum entropy threshold or the minumum number of sequences in all subalignments are reached.

Usage

1
MEDMat(AlignedSequences=Sequences, minseq = 21, entropymin = 0.6, Plot = TRUE)

Arguments

AlignedSequences

matrix. Sequence Id-by-position matrix as produced by e.g. ImportFastaAlignmentImportFastaAlignment(). This is the main difference to MED(), the latter working on files and not on object in the current workspace.

minseq

numeric. minimum number of sequences before the procedure stops for a specific subalignment.

entropymin

numeric. minimum entropy level before the procedure stops for a specific subalignment.

Plot

logical. Plots the entropy profiles and also the base composition for the identified high entropy positions.

Value

A matrix of sequence ids (rows) by oligotypes.

Note

The procedure currently only takes one component, which corresponds to the highest entropy. In case of ties, it will take the first site in the list (i.e. smallest site position).

Author(s)

Alban Ramette

References

Inspired by M. Eren et al. paper.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
## File is stored in the current working directory.
#File="HGB_0013_GXJPMPL01A3OQX.fasta"
Aln.list<- ImportFastaAlignment(File) #path to FASTA file
Names <-  Aln.list[[1]]
Sequences <- toupper(Aln.list[[2]])# do not trim trailing dots at 5' and 3' ends

OT.seq.concat <- MEDMat( 
	    AlignedSequences=Sequences,  
             minseq=21,
             entropymin=0.6,
             Plot=TRUE
)

aramette/otu2ot documentation built on May 10, 2019, 12:46 p.m.