drawGenomePool: Draw a length-matched pool of sequences from the genome

Description Usage Arguments Value

View source: R/enrichment.r

Description

Given a query set of ranges, draw a length-matched pool of sequences. Returned ranges are required to (1) not overlap with each other or the query, (2) not extend off chromosome ends, (3) not extend over assembly gaps as defined in the UCSC "gap" table for the given genome assembly.

Usage

1
drawGenomePool(query, n, chrs = NULL, genome, cachedir, sync = TRUE)

Arguments

query

A data.frame or data.table with columns "chr", "start", and "end" and any other columns. If a data.frame or data.table, must contain the columns "chr", "start", "end", where the "start" coordinates are 1-based.

n

Number of times greater than the query set that the size of the returned background pool will be

chrs

Vector of chromosome names to draw from. Useful for restricting to canonical chromosomes only, i.e. chr1 to chr22, chrX, and chrY for hg19. If not given, will restrict to the chromosome names present in query.

genome

The UCSC name specific to the genome of the query coordinates (e.g. "hg19", "hg18", "mm10", etc)

cachedir

A path to a directory where a local cache of UCSC tables are stored. If equal to NULL (default), the data will be downloaded to temporary files and loaded on the fly. Caching is highly recommended to save time and bandwidth.

Value

A GRanges of the background sequences.


jeffbhasin/goldmine documentation built on Nov. 13, 2019, 9:11 a.m.