bumphunt: Detect and score candidate regions

Description Usage Arguments Value

View source: R/DRfinder.R

Description

This is an internal workhorse function called by DRfinder that calculates the nucleotide-level signal, and calls the regionFinder function to determine candidate regions and score them.

Usage

1
2
3
4
5
bumphunt(oligo.mat, design, chr = NULL, pos, coef = 2, minInSpan = 10,
  minNum = 10, minNumRegion = 5, cutoff = NULL, maxGap = 50,
  maxGapSmooth = 50, smooth = FALSE, bpSpan = 100, verbose = TRUE,
  workers = NULL, logT = TRUE, altStat = 0, sampleSize, naive = FALSE,
  ...)

Arguments

oligo.mat

a matrix that contains the nucleotide level counts that has one row per nucleotide and one column per sample.

design

a model matrix with one row per sample and one column per independent covariate.

chr

a character vector of labels for region-level characteristics, with length equal to the number of rows in oligo.mat (and in the same order). This can indicate the chromosome, gene, lncRNA, etc.

pos

a numeric vector of basepair positions for each nucleotide in oligo.mat (and in the same order).

coef

positive integer that indicates which column of the design matrix in design contains the condition covariate of interest

minInSpan

positive integer that represents the minimum number of nucleotides in a smoothing span window if smooth is TRUE. Default value is 10.

minNum

positive integer that represents the minimum number of nucleotides overall in a region to be smoothed (if smooth is TRUE). Default value is 10

minNumRegion

positive integer that represents the minimum number of nucleotides to consider for a candidate region. Default value is 5.

cutoff

scalar value that represents the absolute value (or a vector of two numbers representing a lower and upper bound) for the cutoff of the single nucleotide condition coefficient that is used to discover candidate regions.

maxGap

positive integer that indicates the maximum number of basepairs that can separate two nucleotides before they will be divided into two separate candidate regions. Defaults to 50.

maxGapSmooth

positive integer that indicates the maximum number of basepairs that can separate two nucleotides before they will be divided into two separate smoothing regions. Defaults to 50.

smooth

logical value that indicates whether or not to smooth the nucleotide level signal when discovering candidate regions. Defaults to FALSE.

bpSpan

a positive integer that represents the length in basepairs of the smoothing span window if smooth is TRUE. Default value is 100

verbose

logical value that indicates whether addtional progress messages within each iteration should be printed to stout. Default value is FALSE.

workers

positive integer that represents the number of cores to use if parallelization is desired of the smoothing step.

logT

logical value that indicates whether to model the log2 transformed signal (plus a pseudocount of 1). Default is TRUE. Only set to false if transformation has been done prior to running this function, or if distribution of raw values looks relatively symmetric.

altStat

numeric value indicating whether to use alternate statistic for single loci in constructing candidate regions that incorporates the standard deviation among replicates. If 0 (default), differences in means are used as the statistic. If 1, modified t-statistics (instead of effect size estimates) will be used (t-stat = median difference / sd). Since estimates of standard deviations are noisy for small numbers of replicates, the estimates are smoothed across neighboring loci (though the effect size estimates themselves are not smoothed; that can be accomplished by setting smooth=TRUE). If 2, Wilcoxon rank sum statistics are used. If 3, then the same stat as in 1, but using median absolute deviation (MAD) instead of SD.

sampleSize

positive integer that represents the number of samples in each condition. Defaults to (ncol(OligoSignal)-1)/2.

naive

a logical value indicating whether to use naive region-level statistic in step 2 that simply takes average of statistic in step 1 across the region, instead of the default, which calculates a new statistic that jointly considers all loci in the region. Also, in step 1 the standard deviation among replicates is not considered.

Value

a data.frame that contains the results of region detection. The data.frame contains one row for each candidate region, and 7 columns, in the following order: 1. chr = region level labels such as chromosome, gene, or lncRNA, 2. start = start basepair position of the region, 3. end = end basepair position of the region, 4. indexStart = the index of the region's starting nucleotide, 5. indexEnd = the index of the region's ending nucleotide, 6. length = the number of nucleotides contained in the region, and 7. stat = the test statistic for the condition difference.


cshukla/oligoGames documentation built on May 27, 2019, 8:44 a.m.