extract_tidy_df_from_hmmer: Extract a tidy data frame with the hmmer results.
In currocam/toolkit4pySCA: A set of useful functions for performing an Statistical Coupling Analysis from scratch in R and using pySCA

View source: R/extract_tidy_df_from_hmmer.R

extract_tidy_df_from_hmmer

R Documentation

Extract a tidy data frame with the hmmer results.

Description

Extract a tidy data frame with the hmmer results.

Usage

extract_tidy_df_from_hmmer(
  xml.document,
  by_column = c(alisqacc = "acc", alisqname = "name")
)

Arguments

`xml.document`	A xml_document downloaded from HMMER
`by_column`	A character vector for joining domains hash with sequence's hits hash. By default, it is `c("alisqacc" = "acc", "alisqname" = "name")`, i.e. use to match the results the acc and the names of the sequences. This is the one that should be used in most cases.

Details

Below, we list the meaning of the different columns following the HMMER documentation.

ienv: Envelope start position
jenv: Envelope end position
iali: Alignment start position
jali: Alignment end position
bias: null2 score contribution
oasc: TOptimal alignment accuracy score
bitscore: Overall score in bits, null corrected, if this were the only domain in seq
cevalue: Conditional E-value based on the domain correction
ievalue: Independent E-value based on the domain correction
is_reported: 1 if domain meets reporting thresholds
is_included: 1 if domain meets inclusion thresholds
alimodel: Aligned query consensus sequence phmmer and hmmsearch, target hmm for hmmscan
alimline: Match line indicating identities, conservation +’s, gaps
aliaseq: Aligned target sequence for phmmer and hmmsearch, query for hmmscan
alippline: Posterior probability annotation
alihmmname: Name of HMM (query sequence for phmmer, alignment for hmmsearch and target hmm for hmmscan)
alihmmacc: Accession of HMM
alihmmdesc: Description of HMM
alihmmfrom: Start position on HMM
alihmmto: End position on HMM
aliM: Length of model
alisqname: Name of target sequence (phmmer, hmmscan) or query sequence(hmmscan)
alisqacc: Accession of sequence
alisqdesc: Description of sequence
alisqfrom: Start position on sequence
alisqto: End position on sequence
aliL: Length of sequence
name: Name of the target (sequence for phmmer/hmmsearch, HMM for hmmscan)
acc: Accession of the target
acc2: Secondary accession of the target
id: Identifier of the target
desc: Description of the target
score: Bit score of the sequence (all domains, without correction)
pvalue: P-value of the score
evalue: E-value of the score
nregions: Number of regions evaluated
nenvelopes: Number of envelopes handed over for domain definition, null2, alignment, and scoring.
ndom: Total number of domains identified in this sequence
nreported: Number of domains satisfying reporting thresholding
nregions: Number of regions evaluated
nincluded: Number of domains satisfying inclusion thresholding
taxid: The NCBI taxonomy identifier of the target (if applicable)
species: The species name of the target (if applicable)
kg: The kingdom of life that the target belongs to - based on placing in the NCBI taxonomy tree (if applicable)
seqs: An array containing information about the 100% redundant sequences
pdbs: Array of pdb identifiers (which chains information)
nhits: The number of hits found above reporting thresholds
Z: The number of sequences or models in the target database
domZ: The number of hits in the target database
nmodels: The number of models in this search
nincluded: The number of sequences or models scoring above the significance threshold
nreported: The number of sequences or models scoring above the reporting threshold

Value

DataFrame

Examples

## Not run: 
 xml.path %>%
   read_xml() %>%
   extract_tidy_df_from_hmmer()

## End(Not run)

currocam/toolkit4pySCA documentation built on April 7, 2022, 8:17 p.m.

currocam/toolkit4pySCA index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

currocam/toolkit4pySCA
A set of useful functions for performing an Statistical Coupling Analysis from scratch in R and using pySCA

extract_tidy_df_from_hmmer: Extract a tidy data frame with the hmmer results.
In currocam/toolkit4pySCA: A set of useful functions for performing an Statistical Coupling Analysis from scratch in R and using pySCA

Extract a tidy data frame with the hmmer results.

Description

Usage

Arguments

Details

Value

Examples

Related to extract_tidy_df_from_hmmer in currocam/toolkit4pySCA...

R Package Documentation

Browse R Packages

We want your feedback!

currocam/toolkit4pySCA A set of useful functions for performing an Statistical Coupling Analysis from scratch in R and using pySCA

extract_tidy_df_from_hmmer: Extract a tidy data frame with the hmmer results. In currocam/toolkit4pySCA: A set of useful functions for performing an Statistical Coupling Analysis from scratch in R and using pySCA

Extract a tidy data frame with the hmmer results.

Description

Usage

Arguments

Details

Value

Examples

Related to extract_tidy_df_from_hmmer in currocam/toolkit4pySCA...

R Package Documentation

Browse R Packages

We want your feedback!

currocam/toolkit4pySCA
A set of useful functions for performing an Statistical Coupling Analysis from scratch in R and using pySCA

extract_tidy_df_from_hmmer: Extract a tidy data frame with the hmmer results.
In currocam/toolkit4pySCA: A set of useful functions for performing an Statistical Coupling Analysis from scratch in R and using pySCA