View source: R/extract_tidy_df_from_hmmer.R
extract_tidy_df_from_hmmer | R Documentation |
Extract a tidy data frame with the hmmer results.
extract_tidy_df_from_hmmer( xml.document, by_column = c(alisqacc = "acc", alisqname = "name") )
xml.document |
A xml_document downloaded from HMMER |
by_column |
A character vector for joining domains hash with
sequence's hits hash. By default, it is
|
Below, we list the meaning of the different columns following the HMMER documentation.
ienv: Envelope start position
jenv: Envelope end position
iali: Alignment start position
jali: Alignment end position
bias: null2 score contribution
oasc: TOptimal alignment accuracy score
bitscore: Overall score in bits, null corrected, if this were the only domain in seq
cevalue: Conditional E-value based on the domain correction
ievalue: Independent E-value based on the domain correction
is_reported: 1 if domain meets reporting thresholds
is_included: 1 if domain meets inclusion thresholds
alimodel: Aligned query consensus sequence phmmer and hmmsearch, target hmm for hmmscan
alimline: Match line indicating identities, conservation +’s, gaps
aliaseq: Aligned target sequence for phmmer and hmmsearch, query for hmmscan
alippline: Posterior probability annotation
alihmmname: Name of HMM (query sequence for phmmer, alignment for hmmsearch and target hmm for hmmscan)
alihmmacc: Accession of HMM
alihmmdesc: Description of HMM
alihmmfrom: Start position on HMM
alihmmto: End position on HMM
aliM: Length of model
alisqname: Name of target sequence (phmmer, hmmscan) or query sequence(hmmscan)
alisqacc: Accession of sequence
alisqdesc: Description of sequence
alisqfrom: Start position on sequence
alisqto: End position on sequence
aliL: Length of sequence
name: Name of the target (sequence for phmmer/hmmsearch, HMM for hmmscan)
acc: Accession of the target
acc2: Secondary accession of the target
id: Identifier of the target
desc: Description of the target
score: Bit score of the sequence (all domains, without correction)
pvalue: P-value of the score
evalue: E-value of the score
nregions: Number of regions evaluated
nenvelopes: Number of envelopes handed over for domain definition, null2, alignment, and scoring.
ndom: Total number of domains identified in this sequence
nreported: Number of domains satisfying reporting thresholding
nregions: Number of regions evaluated
nincluded: Number of domains satisfying inclusion thresholding
taxid: The NCBI taxonomy identifier of the target (if applicable)
species: The species name of the target (if applicable)
kg: The kingdom of life that the target belongs to - based on placing in the NCBI taxonomy tree (if applicable)
seqs: An array containing information about the 100% redundant sequences
pdbs: Array of pdb identifiers (which chains information)
nhits: The number of hits found above reporting thresholds
Z: The number of sequences or models in the target database
domZ: The number of hits in the target database
nmodels: The number of models in this search
nincluded: The number of sequences or models scoring above the significance threshold
nreported: The number of sequences or models scoring above the reporting threshold
DataFrame
## Not run: xml.path %>% read_xml() %>% extract_tidy_df_from_hmmer() ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.