README.md

PhyloProfileData

Bioconductor . license: MIT

The PhyloProfileData package provides a collection of datasets to accompany the R package PhyloProfile pakage (Tran et al. 2018), where they are used to illustrate how to run PhyloProfile and analyse its results. Briefly, it contains the phylogenetic profiles, the fasta sequences and the domain annotations for two experimental data sets, including 1. 147 human proteins in the AMPK-TOR pathway across 83 species, and 2. 1011 BUSCO arthropoda ortholog groups across 88 species in the three domains of life.

Installation

if (!requireNamespace("BiocManager"))
    install.packages("BiocManager")
BiocManager::install("PhyloProfileData")

Usage

The data are stored in the ExperimentHub of Bioconductor and can be accessed using the following R commands:

# Load the data of the PhyloProfileData package
library(ExperimentHub)
eh = ExperimentHub()
myData <- query(eh, "PhyloProfileData")
# View the metadata of this data package
myData
ExperimentHub with 6 records
# snapshotDate(): 2019-05-29 
# $dataprovider: Applied Bioinformatics Dept., Goethe University Frankfurt
# $species: NA
# $rdataclass: data.frame, AAStringSet
# additional mcols(): taxonomyid, genome, description, coordinate_1_based,
#   maintainer, rdatadateadded, preparerclass, tags, rdatapath, sourceurl,
#   sourcetype 
# retrieve records with, e.g., 'object[["EH2544"]]' 

           title                                                                         
  EH2544 | Phylogenetic profiles of human AMPK-TOR pathway                               
  EH2545 | FASTA sequences for proteins in the phylogenetic profiles of human AMPK-TOR...
  EH2546 | Domain annotations for proteins in the phylogenetic profiles of human AMPK-...
  EH2547 | Phylogenetic profiles of BUSCO arthropoda proteins                            
  EH2548 | FASTA sequences for proteins in the phylogenetic profiles of BUSCO arthropo...
  EH2549 | Domain annotations for proteins in the phylogenetic profiles of BUSCO arthr...

Each data set contains three files (objects) corresponding for the phylogenetic profiles, the FASTA sequences and the protein domain annotations. A particular data object can be retrieve using its ID, for example:

# Retrieve FASTA sequences for proteins in the phylogenetic profiles of the 
# human AMPK-TOR pathway
ampkTorFasta <- myData[["EH2545"]]

For a detailed description of each data set and the belonging data objects please see the vignette PhyloProfileData.

library(PhyloProfileData)
browseVignettes("PhyloProfileData")

Bugs, Comments and Suggests

Any bug reports or comments, suggestionsare highly appreciated. Please open an issue on GitHub or be in touch via email.

Contributors

License

This data package is released under MIT license.

How-To Cite

Ngoc-Vinh Tran, Bastian Greshake Tzovaras, Ingo Ebersberger; PhyloProfile: Dynamic visualization and exploration of multi-layered phylogenetic profiles, Bioinformatics, , bty225, https://doi.org/10.1093/bioinformatics/bty225

or use the citation function in R CMD to have it directly in BibTex or LaTeX format

citation("PhyloProfileData")

Contact

Vinh Tran tran@bio.uni-frankfurt.de



BIONF/PhyloProfileData documentation built on June 30, 2021, 9:10 p.m.