search_hmmscan: Perform a hmmscan search of a protein sequence against a...

View source: R/search_hmmscan.R

search_hmmscanR Documentation

Perform a hmmscan search of a protein sequence against a profile-HMM database.

Description

Perform a hmmscan search of a protein sequence against a profile-HMM database.

Usage

search_hmmscan(seqs, dbs = "pfam", verbose = TRUE, timeout = 90)

Arguments

seqs

A vector of characters containing the sequences of the query.

dbs

A character vector containing the target databases. Frequently used databases are pfam, tigrfam gene3d, superfamily, pirsf and treefam, but a complete and updated list is available at https://www.ebi.ac.uk/Tools/hmmer/.

verbose

A logical, if TRUE details of the download process is printed.

timeout

An integer specifying the number of seconds to wait for the reply before a time out occurs.

Value

A nested DataFrame with columns seqs, dbs, url (HMMER temporary url), hits, stats, domains and, if selected, fullseq.fasta.

Stats data structure

  • nhits: The number of hits found above reporting thresholds.

  • Z: The number of sequences or models in the target database.

  • domZ: The number of hits in the target database.

  • nmodels The number of models in this search.

  • nincluded The number of sequences or models scoring above the significance threshold.

  • nreported The number of sequences or models scoring above the reporting threshold

Sequence data structure

The hits array contains one or more sequences. Tthe redundant sequence information will also be included.

  • name: Name of the target (sequence for phmmer/hmmsearch, HMM for hmmscan).

  • acc: Accession of the target.

  • acc2: Secondary accession of the target.

  • id: Identifier of the target.

  • desc: Description of the target.

  • score: Bit score of the sequence (all domains, without correction)

  • pvalue: P-value of the score.

  • evalue: E-value of the score.

  • nregions: Number of regions evaluated.

  • nenvelopes: Number of envelopes handed over for domain definition, null2, alignment, and scoring.

  • ndom: Total number of domains identified in this sequence.

  • nreported: Number of domains satisfying reporting thresholding.

  • nincluded: Number of domains satisfying inclusion thresholding.

  • taxid: The NCBI taxonomy identifier of the target (if applicable).

  • species: The species name of the target (if applicable).

  • kg: The kingdom of life that the target belongs to - based on placing in the NCBI taxonomy tree (if applicable).

Domain data structure

The domain contains the details of the match, in particular the alignment between the query and the target.

  • ienv: Envelope start position.

  • jenv: Envelope end position.

  • iali: Alignment start position.

  • jali: Alignment end position.

  • bias: null2 score contribution.

  • oasc: Optimal alignment accuracy score.

  • bitscore: Overall score in bits, null corrected, if this were the only domain in seq.

  • cevalue: Conditional E-value based on the domain correction.

  • ievalue: Independent E-value based on the domain correction.

  • is_reported: 1 if domain meets reporting thresholds.

  • is_included: 1 if domain meets inclusion thresholds.

  • alimodel: Aligned query consensus sequence phmmer and hmmsearch, target hmm for hmmscan.

  • alimline: Match line indicating identities, conservation +’s, gaps.

  • aliaseq: Aligned target sequence for phmmer and hmmsearch, query for hmmscan.

  • alippline: Posterior probability annotation.

  • alihmmname: Name of HMM (query sequence for phmmer, alignment for hmmsearch and target hmm for hmmscan).

  • alihmmacc: Accession of HMM.

  • alihmmdesc: Description of HMM.

  • alihmmfrom: Start position on HMM.

  • alihmmto: End position on HMM.

  • aliM: Length of model.

  • alisqname: Name of target sequence (phmmer, hmmscan) or query sequence(hmmscan).

  • alisqacc: Accession of sequence.

  • alisqdesc: Description of sequence.

  • alisqfrom: Start position on sequence.

  • alisqto: End position on sequence.

  • aliL: Length of sequence.


currocam/taxa2hmmer documentation built on April 10, 2022, 11:02 a.m.