genomic_regions_correlation: Correlation between two sets of genomic regions

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/genomic_region_correlation.R

Description

Correlation between two sets of genomic regions

Usage

1
2
3
4
genomic_regions_correlation(gr_list_1, gr_list_2, background = NULL,
    chromosome = paste0("chr", c(1:22, "X", "Y")), species = "hg19",
    nperm = 0, mc.cores = 1, stat_fun = genomic_corr_jaccard, ...,
    bedtools_binary = Sys.which("bedtools"), tmpdir = tempdir())

Arguments

gr_list_1

a list of GRanges objects, should be a named list, e.g. low methylated regions in different samples.

gr_list_2

a list of GRanges objects, should be a named list, e.g. a list of genomic features.

background

a GRanges object. The correlation is only looked in the background regions.

chromosome

a vector of chromosome names

species

species, used for random shuffling genomic regions

nperm

number of random shufflings. If it is set to 0 or 1, no random shuffling will be performed.

mc.cores

number of cores for parallel calculation

stat_fun

method to calculate correlations. There are some pre-defined functions: genomic_corr_reldist, genomic_corr_absdist measure how two sets of genomic regions are close; genomic_corr_jaccard, genomic_corr_intersect measures how two sets of genomic regions are overlapped. The self-defined function should accept at least two arguments which are two GRanges object. The third argument is ... which is passed from the main function. The function should only return a numeric value.

...

pass to stat_fun

bedtools_binary

random shuffling is perfomed by bedtools. If bedtools is not in PATH, the path of bedtools can be set here.

tmpdir

dir for temporary files

Details

The correlation between two sets of genomic regions basically means how much the first type of genomic regions are overlapped or close to the second type of genomic regions.

The significance of the correlation is calculated by random shuffling the regions. In random shuffling, regions in gr_list_1 will be shuffled. So if you want to shuffle gr_list_2, just switch the first two arguments.

Pleast note random shuffling is done by "bedtools", so "bedtools" should be installed and exists in PATH and should support -i -g -incl options.

This function is very time-consuming.

Value

A list containing following elements:

stat

statistic value

fold_change

stat/E(stat), stat divided by expected value which is generated from random shuffling

p.value

p-value for over correlated. So, 1 - p.value is the significance for being less correlated

stat_random_mean

mean value of stat in random shuffling

stat_random_sd

standard deviation in random shuffling

If perm is set to 0 or 1, fold_change, p.value, stat_random_mean and stat_random_sd are all NULL.

Author(s)

Zuguang Gu <z.gu@dkfz.de>

See Also

genomic_corr_reldist, genomic_corr_jaccard, genomic_corr_absdist, genomic_corr_intersect,

Examples

1
2
3
4
gr1 = GRanges(seqname = "chr1", ranges = IRanges(start = c(4, 10), end = c(6, 16)))
gr2 = GRanges(seqname = "chr1", ranges = IRanges(start = c(7, 13), end = c(8, 20)))
genomic_regions_correlation(gr1, gr2, nperm = 0)
genomic_regions_correlation(list(gr1 = gr1), list(gr2 = gr2), nperm = 0)

jokergoo/epik documentation built on Sept. 28, 2019, 9:20 a.m.