ZW-xjtlu/WhistleR: Comprehensive Annotation of Predictive Features for Interval-based Genomic Data

Extract predictive genome-derived features and sequence-derived features from genomic regions. WhistleR aims to facilitate the feature engineering process for predictive modeling of genomic data. The package can enumerate a large number of genome-derived features through the combinations between genomic properties (e.g., length, sequence contents, clustering metrics, conservation scores, distance toward ends, ect.) and genomic regions (e.g. 5'UTR, CDS, 3'UTR, exons, introns, promoters, transcripts, genes, ect.). Compared with only using the sequence-derived features, adding comprehensive features of region properties can significantly improve the predictive performance of various end applications in genomics. In addition, given that the interpretations of regional properties are highly intuitive, novel biological insights can be easily obtained through the feature importance analyses.

Getting started

Package details

Bioconductor views Classification Clustering FeatureExtraction FunctionalGenomics FunctionalPrediction GenomeAnnotation Regression
LicenseLGPL (>= 3)
Package repositoryView on GitHub
Installation Install the latest version of this package by entering the following in R:
ZW-xjtlu/WhistleR documentation built on March 13, 2021, 10:50 a.m.