DRfinder: Perform inference to detect differential regions in oligo...
In cshukla/oligoGames: Analyze data from massively parallel reporter assays

Description Usage Arguments Value Examples

View source: R/DRfinder.R

This is the main inference function that aims to find regions with differential signal between two conditions. The two main steps of the procedure are (1) detect candidate regions from the nucleotide level signal, optionally smoothing to combat loss of power due to low coverage, and (2) evaluate the test statistic of condition difference across each candidate region by comparing to a global null distribution generated by permuting sample labels.

DRfinder(OligoSignal, conditionLabels = c("condition1", "condition2"),
  minInSpan = 10, bpSpan = 100, minNumRegion = 5, cutoff = NULL,
  smooth = FALSE, verbose = FALSE, quiet = FALSE, workers = NULL,
  sampleSize = (ncol(OligoSignal) - 1)/2, maxPerms = 50, logT = TRUE,
  coef = 2, onlyUp = FALSE, altStat = 0, naive = FALSE)

`OligoSignal`	a data frame in the format returned by the function `modelNucCounts`. Contains one row per nucleotide count. The first column contains the basepair positions of the nucleotides and the remaining columns hold the counts themselves (one column per sample).
`conditionLabels`	character vector of length two which contains the condition labels for the two conditions that are being compared.
`minInSpan`	positive integer that represents the minimum number of nucleotides in a smoothing span window if `smooth` is TRUE. Default value is 10.
`bpSpan`	a positive integer that represents the length in basepairs of the smoothing span window if `smooth` is TRUE. Default value is 100
`minNumRegion`	positive integer that represents the minimum number of nucleotides to consider for a candidate region. Default value is 5.
`cutoff`	scalar value that represents the absolute value (or a vector of two numbers representing a lower and upper bound) for the cutoff of the single nucleotide condition coefficient that is used to discover candidate regions.
`smooth`	logical value that indicates whether or not to smooth the nucleotide level signal when discovering candidate regions. Defaults to FALSE.
`verbose`	logical value that indicates whether addtional progress messages within each iteration should be printed to stout. Default value is FALSE.
`quiet`	logical value that indicates whether a message is printed to the stout regarding the completion of each permutation iteration. If FALSE (default value) then messages will be printed. If TRUE, then messages will not be printed (but additional messages will be printed within each iteration if `verbose` is set to TRUE.
`workers`	positive integer that represents the number of cores to use if parallelization is desired of the smoothing step.
`sampleSize`	positive integer that represents the number of samples in each condition. Defaults to `(ncol(OligoSignal)-1)/2`.
`maxPerms`	a positive integer that represents the maximum number of permutations that will be used to generate the global null distribution of test statistics.
`logT`	logical value that indicates whether to model the log2 transformed signal (plus a pseudocount of 1). Default is TRUE. Only set to false if transformation has been done prior to running this function, or if distribution of raw values looks relatively symmetric.
`coef`	positive integer that indicates which column of the design matrix in `design` contains the condition covariate of interest
`onlyUp`	a logical value indicating whether to only consider differences in the positive direction (signal in condition 1 - condition 2 > 0). Default value is FALSE.
`altStat`	numeric value indicating whether to use alternate statistic for single loci in constructing candidate regions that incorporates the standard deviation among replicates. If 0 (default), differences in means are used as the statistic. If 1, modified t-statistics (instead of effect size estimates) will be used (t-stat = median difference / sd). Since estimates of standard deviations are noisy for small numbers of replicates, the estimates are smoothed across neighboring loci (though the effect size estimates themselves are not smoothed; that can be accomplished by setting smooth=TRUE). If 2, Wilcoxon rank sum statistics are used. If 3, then the same stat as in 1, but using median absolute deviation (MAD) instead of SD.
`naive`	a logical value indicating whether to use naive region-level statistic in step 2 that simply takes average of statistic in step 1 across the region, instead of the default, which calculates a new statistic that jointly considers all loci in the region. Also, in step 1 the standard deviation among replicates is not considered.

a data.frame that contains the results of the inference. The data.frame contains one row for each candidate region, and 9 columns, in the following order: 1. chr = region level labels such as chromosome, gene, or lncRNA, 2. start = start basepair position of the region, 3. end = end basepair position of the region, 4. indexStart = the index of the region's starting nucleotide, 5. indexEnd = the index of the region's ending nucleotide, 6. length = the number of nucleotides contained in the region, 7. stat = the test statistic for the condition difference, 8. pval = the permutation p-value for the significance of the test statistic, and 9. qval = the q-value for the test statistic (adjustment for multiple comparisons to control false discovery rate).

## Not run: 
normalizedCounts <- normCounts(rawCounts = system.file("extdata", 
"allTranscriptsCounts_Raw.tsv", package = "oligoGames"))
metaData <- system.file("extdata", "oligoMeta.tsv", package = "oligoGames")
oligoLen <- 110
conditionLabels <- c("Nuclei", "Total")
modeledNucs <- modelNucCounts(normalizedCounts, metaData, 
conditionLabels, modelMethod = "median", oligoLen = 110)
DRregions <- DRfinder(modeledNucs, conditionLabels,
minNumRegion = 3, cutoff = 0.25, smooth = FALSE,
workers = 1, sampleSize = 4, maxPerms = 50, altStat=1)

## End(Not run)