GenometriCorr: GenometriCorr (Genometric Correlation) package

Description Details Author(s) References See Also Examples

Description

GenometriCorr evaluates the spatial correlation between two types of features (experimental or annotated) in genomic coordinates. Several different approaches are implemented, each intended to evaluate a different biologically relevant spatial relationship. The genomic features are split into intervals and correlations are evaluated as described in the package vignette.

Most standard genomic feature annotation file formats are accepted as input. Each feature type is given in a separate file, one used as a query and the other used as a reference. The statistical comparisons that are made are not symmetric; that is, they are sensitive to which file is the query and which is the reference, so we recommend running the comparisons twice, switching the status of the input files for the second run. This is intentional, as biological relationships can be asymmetric. Once the files are read, the annotations are stored as GRanges objects from GenomicRanges package).

There are two classes that are provided by the package. One of them, GenometriCorrConfig-class provide serialization into configuration file that describe a full run of the package that is intended to explore spatial correlations between two markups. The class also provide run.config method that open the input data files, read them, make all the calculations by applying the central GenometriCorrelation function to mapped or unmapped data (see below). The GenometriCorrelation function returns an instance of GenometriCorrResult-class that is based on a list with the results of the run and also contains the GenometriCorrConfig-class instance that describes the run as a slot. The GenometriCorrResult-class provides show method and two graphical representations: graphical.report and visualize.

Below we describe the statistical comparisons implemented in this package. One set of features is termed the “query” and the other, the “reference”, throughout.

When both query and reference features are restricted to genomic subsets, or when we want to compare their relationship only within smaller genomic intervals (e.g. genes), we can remap the intervals of interest onto pseudochromosomes and use the MapRangesToGenomicIntervals provided by the package. Each mapping result is a GRanges object. The pseudochromosomes can now be treated as usual by GenometriCorrelation to test the correlations of interest.

The package also provides the VisualiseTwoIRanges function that creates a very high-level graphic overview of a pair of annotations on a chromosome (space).

Details

Package: GenometriCorr
Type: Package
Title: Genometric Correlation package
Version: 1.1.23
Date: 2020-02-20
License: Artistic-2.0
LazyLoad: yes
biocViews: Annotation, Genetics, Infrastructure, DataRepresentation, Bioinformatics, StatisticalMethod
URL: http://genometricorr.sourceforge.net/

Author(s)

Alexander Favorov favorov@sensi.org, Loris Mularoni, Yulia Medvedeva, Harris A. Jaffee, Ekaterina V. Zhuravleva, Veronica Busa, Leslie M. Cope, Andrey A. Mironov, Vsevolod J. Makeev, Sarah J. Wheelan

References

http://genometricorr.sourceforge.net/

See Also

See the GenometriCorr package vignette.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
library('rtracklayer')
library('GenometriCorr')

library('rtracklayer')
library('TxDb.Hsapiens.UCSC.hg19.knownGene')

refseq<-transcripts(TxDb.Hsapiens.UCSC.hg19.knownGene)

cpgis<-import(system.file("extdata", "UCSCcpgis_hg19.bed", package = "GenometriCorr"))
seqinfo(cpgis)<-seqinfo(TxDb.Hsapiens.UCSC.hg19.knownGene)[seqnames(seqinfo(cpgis))]

permut.number<-0

#permut.number=0 means all the permutations are off
#permut.number is the common default for ecdf.area.permut.number, mean.distance.permut.number, and jaccard.measure.permut.number
#these three can be set separetely; explicit set overrloads the default


cpgi_to_genes<-GenometriCorrelation(cpgis,refseq,chromosomes.to.proceed=c('chr1','chr2','chr3'),permut.number=permut.number,keep.distributions=FALSE,showProgressBar=FALSE)

print(cpgi_to_genes)

VisualiseTwoIRanges(
	ranges(cpgis[seqnames(cpgis)=='chr1']),
	ranges(refseq[seqnames(refseq)=='chr1']),
	nameA='CpG Islands',nameB='RefSeq Genes',
	chrom_length=seqlengths(TxDb.Hsapiens.UCSC.hg19.knownGene)['chr1'],
	title="CpGIslands and RefGenes on chr1 of Hg19 animal")

#mapping example, same as in the vignette
population<-1000

chromo.length<-c(3000000)

names(chromo.length)<-c('the_chromosome')

rquery<-GRanges(ranges=IRanges(start=runif(population,1000001,2000000-9),width=c(10)),seqnames='the_chromosome')

rref<-GRanges(ranges=IRanges(start=runif(population,1000001,2000000-9),width=c(10)),seqnames='the_chromosome')

#create two features, they are randomly scattered in 1 000 000...2 000 000

unmapped_result<-GenometriCorrelation(rquery,rref,chromosomes.length=chromo.length,permut.number=permut.number,keep.distributions=FALSE,showProgressBar=FALSE)

#correlate them on the whole chromosome: 1...3 000 000

cat('Unmapped result:\n')
print(unmapped_result)

map_space<-GRanges(ranges=IRanges(start=c(1000001),end=c(2000000)),seqname='the_chromosome')

mapped_rquery<-MapRangesToGenomicIntervals(what.to.map=rquery,where.to.map=map_space)

mapped_rref<-MapRangesToGenomicIntervals(what.to.map=rref,where.to.map=map_space)

#map them into 1 000 001...2 000 000

mapped_result<-GenometriCorrelation(mapped_rquery,mapped_rref,permut.number=permut.number,keep.distributions=FALSE,showProgressBar=FALSE)

#then, correlate again

cat('Mapped result:\n')
print(mapped_result)

favorov/GenometriCorr documentation built on March 30, 2021, 5:21 p.m.