createRandomRegions: Create Random Regions

View source: R/createRandomRegions.R

createRandomRegionsR Documentation

Create Random Regions

Description

Creates a set of random regions with a given mean size and standard deviation.

Usage

createRandomRegions(nregions=100, length.mean=250, length.sd=20, genome="hg19", mask=NULL, non.overlapping=TRUE)

Arguments

nregions

The number of regions to be created.

length.mean

The mean size of the regions created. This is not guaranteed to be the mean of the final region set. See note.

length.sd

The standard deviation of the region size. This is not guaranteed to be the standard deviation of the final region set. See note.

genome

The reference genome to use. A valid genome object. Either a GenomicRanges or data.frame containing one region per whole chromosome or a character uniquely identifying a genome in BSgenome (e.g. "hg19", "mm10" but not "hg"). Internally it uses getGenomeAndMask.

mask

The set of regions specifying where a random region can not be (centromeres, repetitive regions, unmappable regions...). A region set in any of the accepted formats (GenomicRanges, data.frame, ...). NULL will try to derive a mask from the genome (currently only works is the genome is a character string) and NA explicitly gives an empty mask.

non.overlapping

A boolean stating whether the random regions can overlap (FALSE) or not (TRUE).

Details

A set of nregions will be created and randomly placed over the genome. The lengths of the region set will follow a normal distribution with a mean size length.mean and a standard deviation length.sd. The new regions can be made explicitly non overlapping by setting non.overlapping to TRUE. A mask can be provided so no regions fall in a forbidden part of the genome.

Value

It returns a GenomicRanges object with the regions resulting from the randomization process.

Note

If the standard deviation of the length is large with respect to the mean, negative lengths might be created. These region lengths will be transfromed to into a 1 and so the, for large standard deviations the mean and sd of the lengths are not guaranteed to be the ones in the parameters.

See Also

getGenome, getMask, getGenomeAndMask, characterToBSGenome, maskFromBSGenome, randomizeRegions, resampleRegions

Examples

genome <- data.frame(c("chr1", "chr2"), c(1, 1), c(180000000, 20000000))
mask <- data.frame("chr1", c(20000000, 100000000), c(22000000, 130000000))

createRandomRegions(nregions=10, length.mean=1000, length.sd=500)

createRandomRegions(nregions=10, genome=genome, mask=mask, non.overlapping=TRUE)


bernatgel/regioneR documentation built on Sept. 10, 2023, 12:03 a.m.