utilsSeq: Utils to manipulate sequences or OTUs
In walterxie/ComMA: Community Matrix Analysis

Description Usage Arguments Details Examples

Utils to manipulate sequences or OTUs, such as extract OTUs given the subset of OTU names. They are depended on Bioconductor packages http://www.bioconductor.org/.

subsetSequences(in.file, out.file, otus.names = c(),
  regex1 = "(\\|[0-9]+)", regex2 = "", ignore.case = TRUE,
  max.seq = 0)

renameFastaID(fasta, regex1 = NA, regex2 = "", ignore.case = TRUE)

rmDuplicateSeq(in.file, out.file = "unique-alg.fasta")

getTaxaMap(in.file, trait.name = "taxon", trait.value = "Bacteria",
  regex1 = NA, regex2 = "", ignore.case = TRUE)

`in.file`	The fasta file of OTU representive sequences. Read by `readFasta`.
`out.file`	The output fasta file containing extrated sequences.
`otus.names`	The vector of names to match sequence labels in `in.file`.
`regex1, regex2`	Use for `gsub(regex1, regex2, id(fasta))` to remove or replace annotation from original labels. Default to `regex1="(\\|[0-9]+)", regex2=""` in `subsetSequences`, which removes size annotation seperated by "\|", but NA in `renameFastaID`, which does nothing.
`ignore.case`	Refer to `gsub`.
`max.seq`	Give the number (`max.seq`) of seleceted sequences, if extracted sequences `> max.seq`, then choose `max.seq` sequences randomly from them. Defaul to 0 to ignore it.
`fasta`	The fasta object returned by `readFasta`.
`trait.name, trait.value`	The trait to annotate a tree. Its format in the string will look like '[trait.name=trait.value]'.

subsetSequences returns the subset of OTUs matching given OTU names. It is depended on ShortRead package. Follow the instruction https://bioconductor.org/packages/release/bioc/html/ShortRead.html to install.

renameFastaID renames the sequences id loaded from a fasta file using ShortRead package.

rmDuplicateSeq removes duplicate sequences or alginments.

getTaxaMap returns a data frame of mapping file to annotate a tree, such as annotateRAXMLTree.

subsetSequences(in.file, otus.names, out.file)

fasta <- renameFastaID(fasta)

rmDuplicateSeq("alg.fasta", "unique-alg.fasta")

taxa.map <- getTaxaMap("16s-otus-Bacteria.fasta", trait.name="taxon", trait.value="Bacteria")