utilsSeq: Utils to manipulate sequences or OTUs

Description Usage Arguments Details Examples

Description

Utils to manipulate sequences or OTUs, such as extract OTUs given the subset of OTU names. They are depended on Bioconductor packages http://www.bioconductor.org/.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
subsetSequences(in.file, out.file, otus.names = c(),
  regex1 = "(\\|[0-9]+)", regex2 = "", ignore.case = TRUE,
  max.seq = 0)

renameFastaID(fasta, regex1 = NA, regex2 = "", ignore.case = TRUE)

rmDuplicateSeq(in.file, out.file = "unique-alg.fasta")

getTaxaMap(in.file, trait.name = "taxon", trait.value = "Bacteria",
  regex1 = NA, regex2 = "", ignore.case = TRUE)

Arguments

in.file

The fasta file of OTU representive sequences. Read by readFasta.

out.file

The output fasta file containing extrated sequences.

otus.names

The vector of names to match sequence labels in in.file.

regex1, regex2

Use for gsub(regex1, regex2, id(fasta)) to remove or replace annotation from original labels. Default to regex1="(\|[0-9]+)", regex2="" in subsetSequences, which removes size annotation seperated by "|", but NA in renameFastaID, which does nothing.

ignore.case

Refer to gsub.

max.seq

Give the number (max.seq) of seleceted sequences, if extracted sequences > max.seq, then choose max.seq sequences randomly from them. Defaul to 0 to ignore it.

fasta

The fasta object returned by readFasta.

trait.name, trait.value

The trait to annotate a tree. Its format in the string will look like '[trait.name=trait.value]'.

Details

subsetSequences returns the subset of OTUs matching given OTU names. It is depended on ShortRead package. Follow the instruction https://bioconductor.org/packages/release/bioc/html/ShortRead.html to install.

renameFastaID renames the sequences id loaded from a fasta file using ShortRead package.

rmDuplicateSeq removes duplicate sequences or alginments.

getTaxaMap returns a data frame of mapping file to annotate a tree, such as annotateRAXMLTree.

Examples

1
2
3
4
5
6
7
subsetSequences(in.file, otus.names, out.file)

fasta <- renameFastaID(fasta)

rmDuplicateSeq("alg.fasta", "unique-alg.fasta")

taxa.map <- getTaxaMap("16s-otus-Bacteria.fasta", trait.name="taxon", trait.value="Bacteria")

walterxie/ComMA documentation built on May 3, 2019, 11:51 p.m.