regionFinder: Break up nucleotide level signal into candidate regions and...

Description Usage Arguments Value

View source: R/DRfinder.R

Description

This is an internal workhorse function for bumphunt that takes the nucleotide-level signal and parses it into contigous regions that pass the threshold and form the candidates, and then scores each one based on a test statistic of the difference.

Usage

1
2
3
4
5
regionFinder(x, chr, pos, cluster = NULL, ind = seq(along = x),
  order = TRUE, minNumRegion = 5, maxGap, cutoff = quantile(abs(x), 0.99),
  assumeSorted = FALSE, oligo.mat = oligo.mat, verbose = TRUE,
  design = design, workers = workers, logT = TRUE, naive = FALSE,
  beta = NULL)

Arguments

x

a vector of condition coefficients (for the covariate of interest) for each nucletide

chr

a character vector of labels for region-level characteristics, with length equal to the number of rows in oligo.mat (and in the same order). This can indicate the chromosome, gene, lncRNA, etc.

pos

a numeric vector of basepair positions for each nucleotide in oligo.mat (and in the same order).

cluster

a vector of cluster membership values for each nucleotide determined by the clusterMaker function in the bumphunter package

ind

a vector if indices of x which are non-NULL. Defaults to all indices of x.

order

logical that indicates whether or not to order the candidate regions by the test statistic magnitude (largest to smallest). Defaults to TRUE.

minNumRegion

positive integer that represents the minimum number of nucleotides to consider for a candidate region. Default value is 5.

maxGap

positive integer that indicates the maximum number of basepairs that can separate two nucleotides before they will be divided into two separate candidate regions. Defaults to 50.

cutoff

scalar value that represents the absolute value (or a vector of two numbers representing a lower and upper bound) for the cutoff of the single nucleotide condition coefficient that is used to discover candidate regions.

assumeSorted

logical that indicates whether the nucleotides are sorted in ascending order. Defaults to FALSE.

oligo.mat

a matrix that contains the nucleotide level counts that has one row per nucleotide and one column per sample.

verbose

logical value that indicates whether addtional progress messages within each iteration should be printed to stout. Default value is FALSE.

design

a model matrix with one row per sample and one column per independent covariate.

workers

positive integer that represents the number of cores to use if parallelization is desired of the smoothing step.

logT

logical value that indicates whether to model the log2 transformed signal (plus a pseudocount of 1). Default is TRUE. Only set to false if transformation has been done prior to running this function, or if distribution of raw values looks relatively symmetric.

naive

a logical value indicating whether to use naive region-level statistic in step 2 that simply takes average of statistic in step 1 across the region, instead of the default, which calculates a new statistic that jointly considers all loci in the region. Also, in step 1 the standard deviation among replicates is not considered.

beta

vector of loci-specific statistics from step 1 (only needed if naive is TRUE)

Value

a data.frame that contains the results of region detection. The data.frame contains one row for each candidate region, and 7 columns, in the following order: 1. chr = region level labels such as chromosome, gene, or lncRNA, 2. start = start basepair position of the region, 3. end = end basepair position of the region, 4. indexStart = the index of the region's starting nucleotide, 5. indexEnd = the index of the region's ending nucleotide, 6. length = the number of nucleotides contained in the region, and 7. stat = the test statistic for the condition difference.


cshukla/oligoGames documentation built on May 27, 2019, 8:44 a.m.