README.md
In raim/segmenTools: Tools for Functional Genome Exploration

Genomic Feature & Coordinate Utilities

The package provides various tools for cross-correlating genome segmentations and annotations.

`segmenTools' were developed specifically for analysis of genome-wide time-series data, more specifically time series with periodic properties such as circadian data sets. But many functionalities are broadly applicable.

Its coordinate indexing and feature annotation utilities used in various publications (see Capabilities), and by u'r gene bro.

The git repository also holds the command-line scripts (directory scripts) that were used for running and analyses of results from Karl, the segmenTier, a (genomic) segmentation algorithm working with abstract similarities, e.g., derived from RNA-seq time series (Machne, Murray & Stadler 2017).

The drawing is the most unconstrained method of modeling in biology, therefore many functionalities in `segmenTools' provide exploratory as well as publication-quality plotting utilities.

library(devtools)
install_github("raim/segmenTools")

... or conventionally via the source files, cloned from github.

Time-Series Analysis

Via Karl: Fourier-based clustering of periodic time-series, after Machne & Murray 2012 and as extended in Machne, Murray & Stadler 2017 for similarity-based segmentation of coordinate-based time-series (RNA-seq).

TODO: Cluster-wise oscillation parameters

library(segmenTier) # for clustering 
library(segmenTools) # for plots

## download & parse data
rawdata.url <- "ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE5nnn/GSE5612/matrix/GSE5612_series_matrix.txt.gz"
rawdata <- gsub( ".*/","",rawdata.url)
if ( !file.exists(rawdata) )
  utils::download.file(url=rawdata.url, dest=rawdata)
dat <- read.delim(gzfile(rawdata),comment.char="!",row.names=1)

## process time-series (Discrete Fourier Transform)
tset <- processTimeseries(dat, use.fft=TRUE, dc.trafo="ash",use.snr=TRUE)
## cluster (by kmeans)
cset <- clusterTimeseries(tset,K=7) # CLUSTERING! takes a while

## and inspect clustered time-series via the versatile
## cluster time series plotter
pdf("edwards06.pdf")
plotClusters(tset, cset, norm="lg2r", each=TRUE, type="all", ylim="all")
plotClusters(tset, cset, norm="lg2r", each=TRUE, q=0.8)
## selected clusters in all-in-one plo
plotClusters(tset, cset, norm="lg2r", each=FALSE, type="rng", cls.srt=c(3,5,7))
dev.off()

Categorical Analysis

Comparing different gene categories (clusters) by cumulative hypergeometric distribution tests, and plotting overlap enrichments after Machne & Murray 2012.

Segment Overlap Analysis

Jaccard index statistics and relative positioning of distinct genome segmentations (interval definitions and annotations); used in Machne, Murray & Stadler 2017 for analysis of segmentations by Karl.

Genomic Coordinate Indexing

Accessing genomic coordinates efficiently by indexing, used by u'r gene bro and Karl.

Positional Alignment and DNA Structural Patterns

Align genomic intervals around specific genomic sites, such as transcription start sites, and calculate position-specific statistics. E.g. to generate sequence or DNA motif enrichment, or average DNA binding data profiles.

Sequence Spectra

... coming soon

Analyzing periodic enrichment of oligomers and DNA structural parameters, after Lehmann, Machne & Herzel 2014.

Parsers

parseGEOSoft parses GEO Soft family files of microarray data sets into data matrices, and accompanying probe-ID mapping, and sample/data annotation
summarizeGEOSoft offers a light-weight summarization function, to average probe data for features with multiple probes
gff2tab parses a GFF file into tabular format, including collection of attributes into data columns

Vignettes:
- clusterTools v segmenTools,
- time series clustering v genomic intervals,
- used in: genomeBrowser; input from: segmenTier, dpseg,
- class clusterOverlaps: sort and plot overlap enrichment profiles, produced by clusterCluster, clusterAnnotation, clusterProfile, segmentOverlaps.
clusterCluster: add fields for statistical corrections,
clusterTimeseries: re-cluster cset by kmeans with centers initialized by flowclust cluster centers,
segmenTier: change indexing to allow segments of length 1, see dpseg.

raim/segmenTools documentation built on July 5, 2025, 4:34 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

raim/segmenTools
Tools for Functional Genome Exploration

README.md
In raim/segmenTools: Tools for Functional Genome Exploration

Genomic Feature & Coordinate Utilities

Installation

Capabilities

Gene-Based

Time-Series Analysis

Categorical Analysis

Coordinate-Based

Segment Overlap Analysis

Genomic Coordinate Indexing

Positional Alignment and DNA Structural Patterns

Sequence Spectra

General Utilities

Parsers

TODO

R Package Documentation

Browse R Packages

We want your feedback!

raim/segmenTools Tools for Functional Genome Exploration

README.md In raim/segmenTools: Tools for Functional Genome Exploration

Genomic Feature & Coordinate Utilities

Installation

Capabilities

Gene-Based

Time-Series Analysis

Categorical Analysis

Coordinate-Based

Segment Overlap Analysis

Genomic Coordinate Indexing

Positional Alignment and DNA Structural Patterns

Sequence Spectra

General Utilities

Parsers

TODO

R Package Documentation

Browse R Packages

We want your feedback!

raim/segmenTools
Tools for Functional Genome Exploration

README.md
In raim/segmenTools: Tools for Functional Genome Exploration