The package provides various tools for cross-correlating genome segmentations and annotations.
`segmenTools' were developed specifically for analysis of genome-wide time-series data, more specifically time series with periodic properties such as circadian data sets. But many functionalities are broadly applicable.
Its coordinate indexing and feature annotation utilities used in various publications (see Capabilities), and by u'r gene bro.
The git repository also holds the command-line scripts (directory
scripts
) that were used for running and analyses of results from
Karl, the segmenTier, a
(genomic) segmentation algorithm working with abstract similarities,
e.g., derived from RNA-seq time series (Machne, Murray & Stadler
2017).
The drawing is the most unconstrained method of modeling in biology, therefore many functionalities in `segmenTools' provide exploratory as well as publication-quality plotting utilities.
library(devtools)
install_github("raim/segmenTools")
... or conventionally via the source files, cloned from github.
Via Karl: Fourier-based clustering of periodic time-series, after Machne & Murray 2012 and as extended in Machne, Murray & Stadler 2017 for similarity-based segmentation of coordinate-based time-series (RNA-seq).
TODO: Cluster-wise oscillation parameters
library(segmenTier) # for clustering
library(segmenTools) # for plots
## download & parse data
rawdata.url <- "ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE5nnn/GSE5612/matrix/GSE5612_series_matrix.txt.gz"
rawdata <- gsub( ".*/","",rawdata.url)
if ( !file.exists(rawdata) )
utils::download.file(url=rawdata.url, dest=rawdata)
dat <- read.delim(gzfile(rawdata),comment.char="!",row.names=1)
## process time-series (Discrete Fourier Transform)
tset <- processTimeseries(dat, use.fft=TRUE, dc.trafo="ash",use.snr=TRUE)
## cluster (by kmeans)
cset <- clusterTimeseries(tset,K=7) # CLUSTERING! takes a while
## and inspect clustered time-series via the versatile
## cluster time series plotter
pdf("edwards06.pdf")
plotClusters(tset, cset, norm="lg2r", each=TRUE, type="all", ylim="all")
plotClusters(tset, cset, norm="lg2r", each=TRUE, q=0.8)
## selected clusters in all-in-one plo
plotClusters(tset, cset, norm="lg2r", each=FALSE, type="rng", cls.srt=c(3,5,7))
dev.off()
Comparing different gene categories (clusters) by cumulative hypergeometric distribution tests, and plotting overlap enrichments after Machne & Murray 2012.
Jaccard index statistics and relative positioning of distinct genome segmentations (interval definitions and annotations); used in Machne, Murray & Stadler 2017 for analysis of segmentations by Karl.
Accessing genomic coordinates efficiently by indexing, used by u'r gene bro and Karl.
Align genomic intervals around specific genomic sites, such as transcription start sites, and calculate position-specific statistics. E.g. to generate sequence or DNA motif enrichment, or average DNA binding data profiles.
... coming soon
Analyzing periodic enrichment of oligomers and DNA structural parameters, after Lehmann, Machne & Herzel 2014.
parseGEOSoft
parses GEO Soft family files of microarray data sets
into data matrices, and accompanying probe-ID mapping, and sample/data
annotationsummarizeGEOSoft
offers a light-weight summarization function, to
average probe data for features with multiple probes gff2tab
parses a GFF file into tabular format, including
collection of attributes into data columnsVignettes:
clusterOverlaps
: sort and plot overlap enrichment profiles,
produced by clusterCluster, clusterAnnotation, clusterProfile,
segmentOverlaps.clusterCluster: add fields for statistical corrections,
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.