bumphunt: Detect and score candidate regions
In cshukla/oligoGames: Analyze data from massively parallel reporter assays

Description Usage Arguments Value

View source: R/DRfinder.R

This is an internal workhorse function called by DRfinder that calculates the nucleotide-level signal, and calls the regionFinder function to determine candidate regions and score them.

bumphunt(oligo.mat, design, chr = NULL, pos, coef = 2, minInSpan = 10,
  minNum = 10, minNumRegion = 5, cutoff = NULL, maxGap = 50,
  maxGapSmooth = 50, smooth = FALSE, bpSpan = 100, verbose = TRUE,
  workers = NULL, logT = TRUE, altStat = 0, sampleSize, naive = FALSE,
  ...)

`oligo.mat`	a matrix that contains the nucleotide level counts that has one row per nucleotide and one column per sample.
`design`	a model matrix with one row per sample and one column per independent covariate.
`chr`	a character vector of labels for region-level characteristics, with length equal to the number of rows in `oligo.mat` (and in the same order). This can indicate the chromosome, gene, lncRNA, etc.
`pos`	a numeric vector of basepair positions for each nucleotide in `oligo.mat` (and in the same order).
`coef`	positive integer that indicates which column of the design matrix in `design` contains the condition covariate of interest
`minInSpan`	positive integer that represents the minimum number of nucleotides in a smoothing span window if `smooth` is TRUE. Default value is 10.
`minNum`	positive integer that represents the minimum number of nucleotides overall in a region to be smoothed (if `smooth` is TRUE). Default value is 10
`minNumRegion`	positive integer that represents the minimum number of nucleotides to consider for a candidate region. Default value is 5.
`cutoff`	scalar value that represents the absolute value (or a vector of two numbers representing a lower and upper bound) for the cutoff of the single nucleotide condition coefficient that is used to discover candidate regions.
`maxGap`	positive integer that indicates the maximum number of basepairs that can separate two nucleotides before they will be divided into two separate candidate regions. Defaults to 50.
`maxGapSmooth`	positive integer that indicates the maximum number of basepairs that can separate two nucleotides before they will be divided into two separate smoothing regions. Defaults to 50.
`smooth`	logical value that indicates whether or not to smooth the nucleotide level signal when discovering candidate regions. Defaults to FALSE.
`bpSpan`	a positive integer that represents the length in basepairs of the smoothing span window if `smooth` is TRUE. Default value is 100
`verbose`	logical value that indicates whether addtional progress messages within each iteration should be printed to stout. Default value is FALSE.
`workers`	positive integer that represents the number of cores to use if parallelization is desired of the smoothing step.
`logT`	logical value that indicates whether to model the log2 transformed signal (plus a pseudocount of 1). Default is TRUE. Only set to false if transformation has been done prior to running this function, or if distribution of raw values looks relatively symmetric.
`altStat`	numeric value indicating whether to use alternate statistic for single loci in constructing candidate regions that incorporates the standard deviation among replicates. If 0 (default), differences in means are used as the statistic. If 1, modified t-statistics (instead of effect size estimates) will be used (t-stat = median difference / sd). Since estimates of standard deviations are noisy for small numbers of replicates, the estimates are smoothed across neighboring loci (though the effect size estimates themselves are not smoothed; that can be accomplished by setting smooth=TRUE). If 2, Wilcoxon rank sum statistics are used. If 3, then the same stat as in 1, but using median absolute deviation (MAD) instead of SD.
`sampleSize`	positive integer that represents the number of samples in each condition. Defaults to `(ncol(OligoSignal)-1)/2`.
`naive`	a logical value indicating whether to use naive region-level statistic in step 2 that simply takes average of statistic in step 1 across the region, instead of the default, which calculates a new statistic that jointly considers all loci in the region. Also, in step 1 the standard deviation among replicates is not considered.

a data.frame that contains the results of region detection. The data.frame contains one row for each candidate region, and 7 columns, in the following order: 1. chr = region level labels such as chromosome, gene, or lncRNA, 2. start = start basepair position of the region, 3. end = end basepair position of the region, 4. indexStart = the index of the region's starting nucleotide, 5. indexEnd = the index of the region's ending nucleotide, 6. length = the number of nucleotides contained in the region, and 7. stat = the test statistic for the condition difference.

cshukla/oligoGames documentation built on May 27, 2019, 8:44 a.m.