focr: False overlapped-cluster rate (FOCR) control procedures
In dipterix/focr: A False Overlapped-Cluster Rate Control ('FOCR') Framework

Description Usage Arguments Details Value Examples

View source: R/focr_grid.R

False overlapped-cluster rate (FOCR) control procedures

focr_initial(
  data,
  data_corr,
  scale,
  blocks,
  nblocks = ncol(data),
  mu = 0,
  alpha = 0.05,
  verbose = FALSE,
  side = c("two", "left", "right"),
  ...
)

focr(
  data,
  block_size,
  alpha = 0.05,
  fdr_method = c("BH", "LAWS", "SABHA", "BY"),
  bandwidth = if (missing(block_size)) {     NA } else {     block_size/2 },
  initial_filter = 0.9,
  dimension = NULL,
  distance_measure = c("euclidean", "lmax", "manhattan"),
  side = c("two", "left", "right"),
  verbose = FALSE,
  blocks,
  ...
)

`data`	a n-by-p numerical matrix (no missing values) with `n` to be the total number of observations and `p` is the total number of hypotheses
`data_corr`	the correlation matrix of `data`. If missing, then the correlation will be calculated empirically
`scale`	numerical vector of standard deviations by column; default is missing (use empirical standard deviation)
`blocks`	a list of indices or a function that returns indices
`nblocks`	the total number of blocks, used when `blocks` is a function
`mu`	the mean function value to compare with; see 'Details'
`alpha`	FOCR level for stage-I, and FDR level for stage-II
`verbose`	whether to print out information; default is false
`side`	test type, `'two'` if alternative hypotheses are two-sided, and `'left'` or `'right'` if one-sided.
`...`	passed to `focr_initial` and `fdr_method`
`block_size`	block size of sliding window; used by `focr`.
`fdr_method`	characters or function of post-selection FDR control procedures. Built-in choices are `"BH"`, `"BY"`, `"SABHA"`, and `"LAWS"`. See vignette for details, see also `fdr-controls`.
`bandwidth`	used by `LAWS` and `LAWS` as smoothing parameters to estimate the underlying sparsity level. Default is half of `block_size`. If `block_size` is missing, `bandwidth` must be specified.
`initial_filter`	used by `LAWS` and `LAWS` as initial filters (purity) to remove large p-values
`dimension`	the dimension information of input hypotheses. For `LAWS` and `LAWS`, current implementation only supports 1-3 dimensions.
`distance_measure`	distance measure used to form blocks; see 'Details'.

The function focr and focr_initial control the type-I error for multiple testing problems with topological constraints:

H_{0}(s):f(s)=μ(s), H_{1}(s):f(s)\neq μ(s)

The type-I error control procedure has two stages. In the first stage, the FOCR is controlled at block (overlapped-cluster) level. This step is to find regions of interests that respect the topological constraints. The second stage further inspects the hypotheses rejected by the first stage. During this stage, conditional p-values will be calculated in a post-selection fashion. FDR control methods are further applied to these conditional p-values to select significant hypotheses at individual level.

Function μ(s) is specified in mu. By default the alternative hypothesis is two-sided. For one-sided tests, please change the parameter side to either "left" or "right".

The function focr_initial controls the FOCR on the block level (stage-I), and calculates the conditional p-values. The function focr uses focr_initial, providing default block settings and built-in post-selection inference on conditional p-values.

By default, focr uses sliding window as blocks. Each block is a ball with distance between the boundary and center point given by block_size/2. The distance measure is specified by distance_measure. The choices are "euclidean", "lmax", and "manhattan". This default settings should work in many spatial or temporal situations. However, in case the blocks are to be customized, please specify blocks manually. The argument blocks can be either a list of hypothesis indices, or a function that returns ones given by locations of hypotheses. See 'vignette' vignette('false-overlapped-cluster-rate', package='focr').

A list of results

method: method name
alpha: level of significance: FOCR in the stage-I and FDR in the stage-II
side: passed from input
blocks: function that returns indices of blocks
nblocks: number of total blocks
rej_blocks: blocks being rejected
rej_hypotheses: individual hypotheses rejected in the first stage
tau: p-value cutoff in the first stage
cond_pvals: conditional p-values in the stage-II
uncond_pvals: unconditional p-values
details: details of initial rejections
stats: block-level test statistics and p-values

The following additional items are focr only.

post_selection: a list returned by FDR controlling methods, see also fdr-controls
fdr_method: function used to control the FDR in stage-II
block_size: block size if specified, passed from input

library(focr)
set.seed(100)
generator <- simulation_data_1D(n_points = 1000, mu_type = 'step',
                             cov_type = 'AR')
data <- generator$gen_data(snr = 0.34)
plot(generator, data = data, snr = 0.34)

# -------------------- Basic usage -------------------------
# FOCR-BH procedure
res <- focr(data = data, block_size = 41,
            alpha = 0.05, fdr_method = 'BH')

# False discovery proportion
fdp <- fdp(res$post_selection$rejs, generator$support)
fdp

# Statistical power
power <- pwr(res$post_selection$rejs, generator$support)
power

# Visualize
plot(generator$mu, type = 'l', col = 'red', ylim = c(-.5,1.5),
     main = sprintf('FOCR-BH, FDP=%.1f%%, Power=%.1f%%',
                    fdp*100, power * 100))
lines(res$cond_pvals, col = 'gray')
abseg(res$rej_hypotheses, y = -0.3, col = 'orange3', lwd = 2)
abseg(res$post_selection$rejs, y = -0.5, col = 'blue', lwd = 2)
legend('topleft', c("Underlying signal", "Conditional p-values",
                    "FOCR initial clusters", "FOCR-BH final rejections"),
       col = c('red', 'orange3', 'blue'), lty = 1, cex = 0.7)

# ------------------------- Change FDR methods --------------------
# FOCR-LAWS
res <- focr(data = data, block_size = 41,
            alpha = 0.05, fdr_method = 'LAWS',
            initial_filter = 0.5)
fdp <- fdp(res$post_selection$rejs, generator$support)
fdp
power <- pwr(res$post_selection$rejs, generator$support)
power

# Visualize
plot(generator$mu, type = 'l', col = 'red', ylim = c(-.5,1.5),
     main = sprintf('FOCR-LAWS, FDP=%.1f%%, Power=%.1f%%',
                    fdp*100, power * 100))
lines(res$cond_pvals, col = 'gray')
abseg(res$rej_hypotheses, y = -0.3, col = 'orange3', lwd = 2)
abseg(res$post_selection$rejs, y = -0.5, col = 'blue', lwd = 2)
legend('topleft', c("Underlying signal", "Conditional p-values",
                    "FOCR initial clusters", "FOCR-LAWS final rejections"),
       col = c('red', 'orange3', 'blue'), lty = 1, cex = 0.7)

# ------------------------- Customized blocks --------------------

# The following example uses disjoint blocks; each block has length of 40
res <- focr(data = data, alpha = 0.05, fdr_method = 'LAWS',
            initial_filter = 0.5, blocks = function(index){
              # Disjoint blocks with size 40
              floor((index -1)/40) * 40 + seq_len(40)
            }, bandwidth = 20)


# Compared to overlapped blocks, disjoint blocks are less powerful
# However, if this might be useful provided the underlying topological
# structure is disjoint
fdp <- fdp(res$post_selection$rejs, generator$support)
fdp
power <- pwr(res$post_selection$rejs, generator$support)
power