Extract predictive genome-derived features and sequence-derived features from genomic regions. WhistleR aims to facilitate the feature engineering process for predictive modeling of genomic data. The package can enumerate a large number of genome-derived features through the combinations between genomic properties (e.g., length, sequence contents, clustering metrics, conservation scores, distance toward ends, ect.) and genomic regions (e.g. 5'UTR, CDS, 3'UTR, exons, introns, promoters, transcripts, genes, ect.). Compared with only using the sequence-derived features, adding comprehensive features of region properties can significantly improve the predictive performance of various end applications in genomics. In addition, given that the interpretations of regional properties are highly intuitive, novel biological insights can be easily obtained through the feature importance analyses.
|Bioconductor views||Classification Clustering FeatureExtraction FunctionalGenomics FunctionalPrediction GenomeAnnotation Regression|
|License||LGPL (>= 3)|
|Package repository||View on GitHub|
Install the latest version of this package by entering the following in R:
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.