getFeatureCounts: Get counts of annotation within a defined window around each...
In malnirav/hiAnnotator: Functions for annotating GRanges objects

Description Usage Arguments Value Note See Also Examples

Given a query object and window size(s), the function finds all the rows in subject which are <= window size/2 distance away. If weights are assigned to each positions in the subject, then tallied counts are multiplied accordingly. For large annotations, use getFeatureCountsBig.

getFeatureCounts(
  sites.rd,
  features.rd,
  colnam = NULL,
  chromSizes = NULL,
  widths = c(1000, 10000, 1e+06),
  weightsColname = NULL,
  doInChunks = FALSE,
  chunkSize = 10000,
  parallel = FALSE
)

`sites.rd`	GRanges object to be used as the query.
`features.rd`	GRanges object to be used as the subject or the annotation table.
`colnam`	column name to be added to sites.rd for the newly calculated annotation...serves as a prefix to windows sizes!
`chromSizes`	named vector of chromosome/seqnames sizes to be used for testing if a position is off the mappable region. DEPRECATED and will be removed in future release.
`widths`	a named/numeric vector of window sizes to be used for casting a net around each position. Default: `c(1000,10000,1000000)`.
`weightsColname`	if defined, weigh each row from features.rd when tallying up the counts.
`doInChunks`	break up sites.rd into small pieces of chunkSize to perform the calculations. Default is FALSE. Useful if you are expecting to find great deal of overlap between sites.rd and features.rd.
`chunkSize`	number of rows to use per chunk of sites.rd. Default to 10000. Only used if doInChunks=TRUE.
`parallel`	use parallel backend to perform calculation with `foreach`. Defaults to FALSE. If no parallel backend is registered, then a serial version of foreach is ran using `registerDoSEQ`.

a GRanges object with new annotation columns appended at the end of sites.rd. There will be a column for each width defined in widths parameter. If widths was a named vector i.e. c("100bp"=100,"1K"=1000), then the colname parameter will be pasted together with width name else default name will be generated by the function.

If parallel=TRUE, then be sure to have a parallel backend registered before running the function. One can use any of the following libraries compatible with foreach: doMC, doSMP, doSNOW, doMPI. For example: library(doMC); registerDoMC(2)

makeGRanges, getNearestFeature, getSitesInFeature, getFeatureCountsBig.

# Convert a dataframe to GRanges object
data(sites)
alldata.rd <- makeGRanges(sites, soloStart = TRUE)

data(genes)
genes.rd <- makeGRanges(genes)

geneCounts <- getFeatureCounts(alldata.rd, genes.rd, "NumOfGene")
## Not run: 
geneCounts <- getFeatureCounts(alldata.rd, genes.rd, "NumOfGene",
doInChunks = TRUE, chunkSize = 200)
geneCounts
## Parallel version of getFeatureCounts
# geneCounts <- getFeatureCounts(alldata.rd, genes.rd, "NumOfGene",
parallel = TRUE)
# geneCounts

## End(Not run)