Currently a proof-of-concept tool. With the introduction of the signalSet class and related methods, it allows for the generation and manipulation of continuous, per-basepair, signal-like representations from -seq related bed and bigwig files.
Current utilities include signal filtering, basic geometric feature extraction and an easy to use implementation for comparing between regions of generated signals from different -seq experiments, against a reference genome or both.
Features calculated from signals can then be integrated alongside readily-provided -seq average values. Emphasizing the use of diverse sources of information, our package interfaces with BSgenome for the integration of sequence information within regions deemed significant by signal-obtained metrics.
While still in a prototype stage, histoneSig will ideally allow for downstream analyses augmented by features obtained directly from our proposed signal representation, wrapped in efficient data objects that will trivialize interoperation with related analysis tools.
Get R 3.5.1.
Consult the sessionInfo.txt file within this repository to see what's being used in a development environment.
devtools::install_github('cmvcordova/histoneSig')
Starting from a narrow peak (or any other bedfile) and its corresponding bigwig file, we're able to:
## Load our peak or preferred bedfile
np_file <- import.np('path/to/npfile.bed')
## Set the ranges we just got obtained to parse relevant bigWig fragments
parsing_bw_ranges <- granges_chr_filter(np_file)
## Parse bigWig
bw_file <- import.bw(con = BigWigFile("path/to/bwfile.bigWig"),
selection = BigWigSelection(parsing_bw_ranges))
## Obtain signals from both of our files
your_first_signalset <- np_signals_from_bigwig(np_file, bw_file)
Behold, a signalSet
observation in all its splendor
The default method is a lowpass filter. Said
filter can take a fixed window_size
or an equal fraction of each signal in the
set as a fractional
window. You can also pass your own filter functions to
filter_signalSet()
(results may vary).
filtered_signalset <- filter_signalSet(your_first_signalset, fractional = 25)
Now, let's use plotSignal()
to compare the first signal of the signalSet
we've
obtained.
rawsignalplot <- plotSignal(your_first_signalset[1])
filteredsignalplot <- plotSignal(filtered_signalset[1])
gridExtra::grid.arrange(rawsignalplot, filteredsignalplot, ncol=2)
We may also illustrate detected peaks (blue) and valleys (red). These will then be used as references to calculate geometric features.
plotSignal(filtered_signalset[1], highlight="both")
Calculating base features from a given signalSet
is now possible; if posterior
interaction with GenomicRanges
objects is desired, we can set our wraptoGRanges
argument as TRUE
; else, we'll obtain a data.frame
. Here, we'll specify notable
valleys found in our signal and their associated geometric features: valley
width ("extension"), height, area and distances to next and previous peaks in
the provided bedfile.
base_features_from_signalsetlist(filtered_signalset,
section="valley", returns="positions", wraptoGRanges=TRUE)
Finally, for comparative analyses, we may create a feature table from a
GRanges
or signalSet
. Sequence information may be integrated, as a one-hot matrix, setting
the include_sequence
parameter to TRUE
. We'll obtain a neat representation
which may or may not include neat signal and geometric features in its
metadata, as a data.table
. This can then be easily interfaced with other
R libraries/models/packages.
So, from a given GRanges
(or vanilla signalSet
) of the following kind:
After running the following command,
build_feature_table(generic_GRanges, metadata_as_features = TRUE, include_sequence =
TRUE, refgenome = "BSgenome.Hsapiens.UCSC.hg38")
We'd obtain a data.table
like this
Nothing formal here just yet, just drop me a line
See also the list of contributors who participated in this project. Currently empty; you could be the first one!
Pending - probably an MIT one in time.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.