focr: False overlapped-cluster rate (FOCR) control procedures

Description Usage Arguments Details Value Examples

View source: R/focr_grid.R

Description

False overlapped-cluster rate (FOCR) control procedures

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
focr_initial(
  data,
  data_corr,
  scale,
  blocks,
  nblocks = ncol(data),
  mu = 0,
  alpha = 0.05,
  verbose = FALSE,
  side = c("two", "left", "right"),
  ...
)

focr(
  data,
  block_size,
  alpha = 0.05,
  fdr_method = c("BH", "LAWS", "SABHA", "BY"),
  bandwidth = if (missing(block_size)) {     NA } else {     block_size/2 },
  initial_filter = 0.9,
  dimension = NULL,
  distance_measure = c("euclidean", "lmax", "manhattan"),
  side = c("two", "left", "right"),
  verbose = FALSE,
  blocks,
  ...
)

Arguments

data

a n-by-p numerical matrix (no missing values) with n to be the total number of observations and p is the total number of hypotheses

data_corr

the correlation matrix of data. If missing, then the correlation will be calculated empirically

scale

numerical vector of standard deviations by column; default is missing (use empirical standard deviation)

blocks

a list of indices or a function that returns indices

nblocks

the total number of blocks, used when blocks is a function

mu

the mean function value to compare with; see 'Details'

alpha

FOCR level for stage-I, and FDR level for stage-II

verbose

whether to print out information; default is false

side

test type, 'two' if alternative hypotheses are two-sided, and 'left' or 'right' if one-sided.

...

passed to focr_initial and fdr_method

block_size

block size of sliding window; used by focr.

fdr_method

characters or function of post-selection FDR control procedures. Built-in choices are "BH", "BY", "SABHA", and "LAWS". See vignette for details, see also fdr-controls.

bandwidth

used by LAWS and LAWS as smoothing parameters to estimate the underlying sparsity level. Default is half of block_size. If block_size is missing, bandwidth must be specified.

initial_filter

used by LAWS and LAWS as initial filters (purity) to remove large p-values

dimension

the dimension information of input hypotheses. For LAWS and LAWS, current implementation only supports 1-3 dimensions.

distance_measure

distance measure used to form blocks; see 'Details'.

Details

The function focr and focr_initial control the type-I error for multiple testing problems with topological constraints:

H_{0}(s):f(s)=μ(s), H_{1}(s):f(s)\neq μ(s)

The type-I error control procedure has two stages. In the first stage, the FOCR is controlled at block (overlapped-cluster) level. This step is to find regions of interests that respect the topological constraints. The second stage further inspects the hypotheses rejected by the first stage. During this stage, conditional p-values will be calculated in a post-selection fashion. FDR control methods are further applied to these conditional p-values to select significant hypotheses at individual level.

Function μ(s) is specified in mu. By default the alternative hypothesis is two-sided. For one-sided tests, please change the parameter side to either "left" or "right".

The function focr_initial controls the FOCR on the block level (stage-I), and calculates the conditional p-values. The function focr uses focr_initial, providing default block settings and built-in post-selection inference on conditional p-values.

By default, focr uses sliding window as blocks. Each block is a ball with distance between the boundary and center point given by block_size/2. The distance measure is specified by distance_measure. The choices are "euclidean", "lmax", and "manhattan". This default settings should work in many spatial or temporal situations. However, in case the blocks are to be customized, please specify blocks manually. The argument blocks can be either a list of hypothesis indices, or a function that returns ones given by locations of hypotheses. See 'vignette' vignette('false-overlapped-cluster-rate', package='focr').

Value

A list of results

method

method name

alpha

level of significance: FOCR in the stage-I and FDR in the stage-II

side

passed from input

blocks

function that returns indices of blocks

nblocks

number of total blocks

rej_blocks

blocks being rejected

rej_hypotheses

individual hypotheses rejected in the first stage

tau

p-value cutoff in the first stage

cond_pvals

conditional p-values in the stage-II

uncond_pvals

unconditional p-values

details

details of initial rejections

stats

block-level test statistics and p-values

The following additional items are focr only.

post_selection

a list returned by FDR controlling methods, see also fdr-controls

fdr_method

function used to control the FDR in stage-II

block_size

block size if specified, passed from input

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
library(focr)
set.seed(100)
generator <- simulation_data_1D(n_points = 1000, mu_type = 'step',
                             cov_type = 'AR')
data <- generator$gen_data(snr = 0.34)
plot(generator, data = data, snr = 0.34)

# -------------------- Basic usage -------------------------
# FOCR-BH procedure
res <- focr(data = data, block_size = 41,
            alpha = 0.05, fdr_method = 'BH')

# False discovery proportion
fdp <- fdp(res$post_selection$rejs, generator$support)
fdp

# Statistical power
power <- pwr(res$post_selection$rejs, generator$support)
power

# Visualize
plot(generator$mu, type = 'l', col = 'red', ylim = c(-.5,1.5),
     main = sprintf('FOCR-BH, FDP=%.1f%%, Power=%.1f%%',
                    fdp*100, power * 100))
lines(res$cond_pvals, col = 'gray')
abseg(res$rej_hypotheses, y = -0.3, col = 'orange3', lwd = 2)
abseg(res$post_selection$rejs, y = -0.5, col = 'blue', lwd = 2)
legend('topleft', c("Underlying signal", "Conditional p-values",
                    "FOCR initial clusters", "FOCR-BH final rejections"),
       col = c('red', 'orange3', 'blue'), lty = 1, cex = 0.7)

# ------------------------- Change FDR methods --------------------
# FOCR-LAWS
res <- focr(data = data, block_size = 41,
            alpha = 0.05, fdr_method = 'LAWS',
            initial_filter = 0.5)
fdp <- fdp(res$post_selection$rejs, generator$support)
fdp
power <- pwr(res$post_selection$rejs, generator$support)
power

# Visualize
plot(generator$mu, type = 'l', col = 'red', ylim = c(-.5,1.5),
     main = sprintf('FOCR-LAWS, FDP=%.1f%%, Power=%.1f%%',
                    fdp*100, power * 100))
lines(res$cond_pvals, col = 'gray')
abseg(res$rej_hypotheses, y = -0.3, col = 'orange3', lwd = 2)
abseg(res$post_selection$rejs, y = -0.5, col = 'blue', lwd = 2)
legend('topleft', c("Underlying signal", "Conditional p-values",
                    "FOCR initial clusters", "FOCR-LAWS final rejections"),
       col = c('red', 'orange3', 'blue'), lty = 1, cex = 0.7)

# ------------------------- Customized blocks --------------------

# The following example uses disjoint blocks; each block has length of 40
res <- focr(data = data, alpha = 0.05, fdr_method = 'LAWS',
            initial_filter = 0.5, blocks = function(index){
              # Disjoint blocks with size 40
              floor((index -1)/40) * 40 + seq_len(40)
            }, bandwidth = 20)


# Compared to overlapped blocks, disjoint blocks are less powerful
# However, if this might be useful provided the underlying topological
# structure is disjoint
fdp <- fdp(res$post_selection$rejs, generator$support)
fdp
power <- pwr(res$post_selection$rejs, generator$support)
power

dipterix/focr documentation built on Dec. 20, 2021, 12:03 a.m.