GenometriCorrelation: GenometriCorrelation

Description Usage Arguments Value Author(s) References See Also Examples

View source: R/GenometriCorrelation.R

Description

The GenometriCorrelation function compares two interval annotations on a chromosome, set of chromosomes or on an entire genome, and performs various statistical tests to determine whether the two interval sets are independent or are positioned nonrandomly with respect to each other. For a complete description of the tests, refer to the the package vignette.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
GenometriCorrelation(
  query,
  reference,
  chromosomes.to.proceed = c(),
  chromosomes.to.include.in.awhole = c(),
  chromosomes.to.exclude.from.awhole = c(),
  add.chr.as.prefix = FALSE,
  awhole.only = FALSE,
  map.to.half = TRUE,
  showProgressBar = TRUE,
  showTkProgressBar = FALSE,
  chromosomes.length = c(),
  suppress.evaluated.length.warning = FALSE,
  permut.number = 100,
  ecdf.area.permut.number = permut.number,
  mean.distance.permut.number = permut.number,
  jaccard.measure.permut.number = permut.number,
  jaccard.permut.is.rearrangement = FALSE,
  alternative = "two.sided",
  awhole.space.name = "awhole",
  keep.distributions = FALSE,
  representing.point.function = mitl,
  query.representing.point.function = representing.point.function,
  reference.representing.point.function = representing.point.function,
  supress.evaluated.length.warning
)

Arguments

query

GRanges object that contains the query interval set coordinates and spaces (generally, chromosomes). The GenometriCorrelation function tests whether it is positioned independently relative to the reference interval set.

reference

The GRanges object that contains the reference interval set coordinates and spaces. The GenometriCorrelation function tests whether query is positioned independently relative to reference.

chromosomes.to.proceed

This vector of strings contains the names of spaces (chromosomes) to analyze. If both query and reference are GRanges if the parameter is not given (its default is c()), the initial list of spaces to proceed is the intersection of the space lists of query and reference names, and a warning is rasied if the lists are different. if chromosomes.to.proceed is given, it restricts the list so that only those names from the intersection that are in chromosomes.to.proceed are analyzed. If it is a subset of the query and reference chromosone lists intersection, the warning that the lists are different is not raised.

chromosomes.to.include.in.awhole

ia a vector of strings contains the names of spaces to be included in the overall (awhole) statistics. Its default is c(), meaning that all the analysed genes are included.

chromosomes.to.exclude.from.awhole

is a list of chromosomes (spaces) to be excluded from the overall statistics.

add.chr.as.prefix

deals with the chr chromosome name prefix. The correlation is only performed on chromosomes that have exactly the same name, so by default, a chromosome named chr1 will not be considered the same chromosome as one simply labeled 1. This argument is provided so that if the chromosome names in the query, reference and the chromosomes* parameters, or the names of the chromosomes in chromosome.length) have no prefix 'chr' (lower-, upper-, or mixed case), this prefix will be added, and all strings matching 'chr' in uppercase or partly uppercase are changed to lowercase. This way, comparisons can be made even if the chromosome names differ by any variation of the word 'chr'. The default is FALSE.

awhole.only

If FALSE, all the considered chromosomes statistics and the summary (awhole) information are returned (default). If TRUE, only the summary (awhole) information is returned. Useful for pipelines. The TRUE value is default that is posted by run.config if the data is processed through mapping.

map.to.half

Some of the tests we use are besed on distances between a query point and the closest reference point. If map.to.half is TRUE (default) we look for the closest reference point upstream or downstream, if it is FALSE, we look only downstream (left). Useful if you are mapping to intervals and want to preserve directionality.

showProgressBar

Toggle the text progress bar. Default is TRUE.

showTkProgressBar

Toggle the Tk progress bar. If it is TRUE but the Tcl/Tk is not loadable (e.g. require('tcltk') returns FALSE), the parameter is turned to FALSE. Default is FALSE.

chromosomes.length

A vector of lengths of the chromosomes to be tested; each chromosome is given a name and a numerical length. The order of the chromosomes does not matter.

suppress.evaluated.length.warning

If there is a chromosome that is included in the evaluation but its length is not given in chromosome.length, the length of the chromosome is calculated as the maximal position mentioned in the data. If this parameter is FALSE, this will generate a warning, which is suppressed when the parameter is TRUE. The default is FALSE.

permut.number

is the common default for ecdf.area.permut.number, mean.distance.permut.number, and jaccard.measure.permut.number. permut.number=0 defaults all the permutations to off.

ecdf.area.permut.number

The number of permutations performed to get the p-value for the area between the ecdf for relative distance distribution and the straight line representing the uniform relative area for the random case.

mean.distance.permut.number

The number of permutations to ascribe p-value to minimal query-reference distance averaged over all query points.

jaccard.measure.permut.number

The number of permutations for Jaccard measure p-value estimation.

jaccard.permut.is.rearrangement

If TRUE, the permutations of the reference for the Jaccard test retain the lengths of all intervals and gaps in the query. All the permuted queries will mirror the original, so the p-value is overestimated. If FALSE (the default), the permutation is a random resampling of starts of the query intervals.

alternative

a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "attraction" or "repulsion". You can specify just the initial letter.

awhole.space.name

The name of the pseudo-space that describes the overall genome statistics. Default is 'awhole'.

keep.distributions

It this is true, the procedure returns all points in th distributions calculated for comparison. This is useful for making figures. Default is FALSE.

representing.point.function

By default, the midpoint of each interval is used as the surrogate for the position of the interval. To force the program to use something other than the midpoint, define the function to use to return comparison points. The function must take the same parameters as the default mitl that returns the middle points. The function is to be passed as the representing.point.function parameter. The default for the parameter is: mitl<-function(start,end,chromosome.length,space){return ((as.integer(start)+as.integer(end))/2)}

query.representing.point.function

The same thing as the representing.point.function, but the representation point calculation is overloaded only for query intervals.

reference.representing.point.function

The same thing as the representing.point.function, but the representation point calculation is overloaded only for query intervals.

supress.evaluated.length.warning

It was a typo for suppress.evaluated.length.warning. Now obsoleted, use suppress.evaluated.length.warning

Value

The result is an instance of the GenometriCorrResult-class that describes the run parameters in its GenometriCorrResult-class @config slot and that extends a namedList list (created originally with a list() call with the results of the run. Each element of the list is also a list that describes results for a space (chromosome); one of them is 'awhole' (or other awhole.space.name if given) that describes the genome awhole, all others are named the same as the chromosomes and describe the chromosomewise statistics. The elements of the 'awhole' and chromosomewise lists are statistical measures and some datasets. The statistical measures are described in the GenometriCorr package help. For further explanation, see the the package vignette.

Below is the description of the values of the list returned for each chromosome.

query.population

Query points used in the comparisons.

reference.population

Reference points used in the comparisons.

alternative

Shows the value of the alternative parameter passed.

relative.distances.ks.p.value

p-value for local independence obtained by the Kolmogorov-Smirnov test for relative distances.

relative.distances.ecdf.deviation.area.p.value

p-value for local independence obtained by the permutation test for relative distances.

relative.distances.ecdf.deviation.area.test.direction

"attraction" or "repulsion". If the alternative parameter is "two.sided" (default), if shows the direction of the effect; if the parameter is "attraction" or "repulsion" it is not shown

relative.distances.ecdf.area.correlation

Has the same sign with the relative distance-based local correlation.

projection.test.p.value

p-value for chromosome-scale independence obtained by the projection test.

projection.test.direction

"attraction" or "repulsion". If the alternative parameter is "two.sided" (default), if shows the direction of the effect; if the parameter is "attraction" or "repulsion" it is not shown

projection.test.obs.to.exp

To measure the effect size, the observed to expected ratio for the projection test statistics that is the number of query characteristic points (by default, midpoints) that fell into a reference features.

scaled.absolute.min.distance.sum.p.value

p-value for chromosome-scale null hypothesis as obtained by the permutations of the query points and the mean of the distances to the two closest reference points.

scaled.absolute.min.distance.sum.test.direction

"attraction" or "repulsion". If the alternative parameter is "two.sided" (default), if shows the direction of the effect; if the parameter is "attraction" or "repulsion" it is not shown

query.reference.intersection

Intersection of reference and query, in bases.

query.reference.union

Union of reference and query, in bases.

jaccard.measure

Jaccard measure of query and reference overlap.

jaccard.measure.p.value

The permutation-based evaluation of the p-value for the obtained Jaccard measure, given the null hypothesis of independence.

jaccard.measure.test.direction

"attraction" or "repulsion". If the alternative parameter is "two.sided" (default), if shows the direction of the effect; if the parameter is "attraction" or "repulsion" it is not shown

The additional values that are returned if keep.distributions=TRUE

relative.distances.data

The original relative distances

relative.distances.ecdf.deviation.area

The real value of the ECDF deviation area to be compared with the permutation to obtain the p-value

relative.distances.ecdf.deviation.area.null.list

The null distribution

projection.test

List of three values: space.length is length of a chromosome; reference.coverage is length of that chromosome covered by reference intervale, and query.hits is the number of query points that fall into the reference intervals.

absolute.min.distance.data

The distribution of query-reference distances

absolute.inter.reference.distance.data

The distribution of reference-reference distances

scaled.absolute.min.distance.sum

The value of the sum (i.e. mean) of scaled absolute distances

scaled.absolute.min.distance.sum.null.list

The null distribution for the scaled absolute distances

jaccard.measure.null.list

The null distribution of Jaccard measures in permutations

Author(s)

Alexander Favorov favorov@sensi.org, Loris Mularoni, Yulia Medvedeva, Harris A. Jaffee, Ekaterina V. Zhuravleva, Veronica Busa, Leslie M. Cope, Andrey A. Mironov, Vsevolod J. Makeev, Sarah J. Wheelan.

References

GenometriCorr home

See Also

The GenometriCorr documentation and vignette.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
library('rtracklayer')
library('GenometriCorr')
library('TxDb.Hsapiens.UCSC.hg19.knownGene')

refseq<-transcripts(TxDb.Hsapiens.UCSC.hg19.knownGene)
cpgis<-import(system.file("extdata", "UCSCcpgis_hg19.bed", package = "GenometriCorr"))
seqinfo(cpgis)<-seqinfo(TxDb.Hsapiens.UCSC.hg19.knownGene)[seqnames(seqinfo(cpgis))]

pn.area<-10
pn.dist<-10
pn.jacc<-10

cpgi_to_genes<-GenometriCorrelation(cpgis,refseq,chromosomes.to.proceed=c('chr1','chr2','chr3'),ecdf.area.permut.number=pn.area,mean.distance.permut.number=pn.dist,jaccard.measure.permut.number=pn.jacc,keep.distributions=FALSE,showProgressBar=FALSE)

print(cpgi_to_genes)

favorov/GenometriCorr documentation built on March 30, 2021, 5:21 p.m.