locus: Locus/Phylogenetic Marker Definition

Description Usage Arguments Details Value Author(s) See Also

Description

Creates a S4 classes defining a phylogenetic marker for a megapteraProj.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
locus(..., not, search.fields = c("gene", "title"),
      use.genomes = TRUE, align.method = "auto", 
      min.identity = 0.75, min.coverage = 0.5, 
      check = FALSE)
   
locusRef(..., not, search.fields = c("gene", "title"),
      use.genomes = TRUE, align.method = "auto",
      min.identity = 0.75, min.coverage = 0.5, 
      reference, adj.gene1 = NULL, adj.gene2 = NULL,
      check = FALSE)

Arguments

...

a vector of mode character giving strings that should be seached for; the first element is taken to name the corresponding postgreSQL tables.

not

a vector of mode character giving strings that should be excluded from the search results; corresponds to the use of NOT in GenBank query.

search.fields

a vector of mode character setting the search fields (or attributes) of the Nucleotide database to be searched for the strings specified via the ... argument.

use.genomes

logical, if TRUE sequences of loci will be extracted from annotated genomes of chloroplasts and mitochondria. Due to the existance of different annotation styles, which might (still) be incompatible with megaptera, this option can be used to turn the extraction of sequences from whole genomes off.

align.method

a character string giving the alignment method in MAFFT. Available accuracy-oriented methods for less than 200 sequences are "localpair", "globalpair", and "genafpair"; "retree 1" and "retree 2" are for speed-oriented alignment. The default is "auto", which lets MAFFT choose an opproriate alignment method.

min.identity

numeric between 0 and 1, giving the minimum proportion of nucleotides required to be identical with the reference sequence in order to be included in an alignment (default: 0.75)

min.coverage

numeric between 0 and 1, giving the minimum proportion of nucleotide positions a sequence must have in common with the reference sequence in order to be included in an alignment (default: 0.5)

reference

an object of class DNAbin containing reference sequences. Alternatively, one or more GI numbers (currently only of chloroplast or mitochondrial genomes) can be given and the appropriate sequences will be extracted automatically.

adj.gene1

a vector of mode character giving strings to identify the upstream coding region of an intergenic spacer (IGS); to be effective the vector of strings given via ... has to identify an IGS.

adj.gene2

a vector of mode character giving strings to identify the downstream coding region of an intergenic spacer (IGS); to be effective the vector of strings given via ... has to identify an IGS.

check

logical: if TRUE, the existance of a locus as specified by ... and not will be checked.

Details

The website http://www.ncbi.nlm.nih.gov/genome/browse/?report=5 or ncbiGenome can be used to find complete organelle genomes that can be used as references.

See also https://www.ncbi.nlm.nih.gov/refseq/rsg/ for the RefSeqGene in the NCBI Reference Sequence collection.

Value

An object of class locus or locusRef

Author(s)

Christoph Heibl

See Also

ncbiGenome helps to select representative organelle genomes as references.

Use dbPars, taxon, and megapteraPars for defining of database parameters, taxa, and the pipeline's parameters, respectively, and megapteraProj for the bundling of input data.


heibl/megaptera documentation built on Jan. 17, 2021, 3:34 a.m.