knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
Data arriving in “streams” from a large number of sources is ubiquitous, where the underlying distribution of each data stream may change over the course of data acquisition due to external stimuli or internal evolution.
Functional magnetic resonance imaging (fMRI) is a good example to illustrate. The blood oxygen level-dependent (BOLD) response, a surrogate measure of brain activity, is activated when suffering from an externally controlled stimulus. The data comprises a series of magnetic resonance brain images, that is the BOLD responses over time from a large number of uniformly spaced volume elements (or voxels). A data stream can be referred to as a voxel. Our primary goal is to discover those activated data streams.
It is worth noting two obvious characteristics existing in large-scale data streams. One is that not all data streams are activated in a specific scene. The other is that the data streams in activation may react at different times during the experiment.
A natural statistical approach is to use change-point or process control theory. Some works (see, for example, @lindquist2007) focus on the modeling of the fMRI data, but litter has been discussed on the inferential side, that is the uncertainty of detected voxels in activation. This is amount to perform streamwise hypothesis testing of whether a change in the BOLD response occurs during the course of an fMRI experiment, and to threshold the resulting activation map of test statistics to meet certain error rate control. An appealing statistical notion of the error rate is the false discovery rate (@Benjamini1995, FDR), that is, the expected proportion of falsely rejected hypotheses.
In this package, we implement the proposed simple yet effective method (see @WenWangZouWang2021) for discovering activated data streams with the FDR being controlled at a prescribed level. The spatial dependence and unknown asynchronous change patterns are taken into consideration. The proposed method is called SLIP, which comprises a sequence of steps by, as its acronym suggests, $\underline{\rm S}$pliting the data into two parts, $\underline{\rm L}$ocating streamwise activation times based on one sample, $\underline{\rm I}$ncorporating spatial dependence among data streams and $\underline{\rm P}$ooling summary statistics on both separated samples, respectively. This package has the same name as the proposed method.
In this ${\tt SLIP}$ package, six procedures are provided including SLIP.lasso()
, SLIP.thresh.d()
, SLIP.thresh.c()
, SLIP.indep()
from the SLIP method and BH.asymp()
, BH.simul()
from the BH method.
We also provide a sample of fMRI data in the ${\tt SLIP}$ package and we illustrate the usage below.
# load package: SLIP library(SLIP) set.seed(1234)
Two data generators named by SLIP.scp.generator()
and SLIP.mcp.generator()
, are provied for generating data streams with possible single change-point (SCP) and multiple change-points (MCP), respectively. More usages of the two functions can be found by ?SLIP.scp.generator
and ?SLIP.mcp.generator
.
First, the data with at most one change-point in each data stream is generated.
N = 90 p = 200 data = SLIP.scp.generator(N, p, dist = "t", param = 5)
Then, the six procedures are applied to the data, with FDR nominal level at 0.2:
alpha = 0.2 # SLIP-thresh-C sig.thrsh.c = SLIP.thresh.c(data$dat, alpha)$sig # SLIP-thresh-D sig.thrsh.d = SLIP.thresh.d(data$dat, alpha)$sig # SLIP-lasso sig.lasso = SLIP.lasso(data$dat, alpha)$sig # SLIP-indep sig.indep = SLIP.indep(data$dat, alpha)$sig # BH-simul ECDF = bootstrap.cusum(N) sig.simul = BH.simul(data$dat, alpha, ECDF)$sig # BH-asymp sig.asymp = BH.asymp(data$dat, alpha)$sig
The indices of discoveries are returned by these procedures. We calculate the false discovery proportion (FDP) and the true discovery proportion (TDP) defined below: $$ {\rm FDP} = |\mathcal{S}\setminus\mathcal{A}|/|\mathcal{S}|\hspace{1cm} {\rm TDP} = |\mathcal{S}\cap\mathcal{A}|/|\mathcal{A}|, $$ where $\mathcal{S}$ and $\mathcal{A}$ are the discovery set and the true set containing data streams with changes, respectively. We only present the FDP and TDP for fast compilation. The FDR (the expectation of FDP) and power (the expectation of TDP) can be approximated by large repetitions.
# false discovery proportion (FDP) and true discovery proportion (TDP) sigList = list(sig.thrsh.c, sig.thrsh.d, sig.lasso, sig.indep, sig.simul, sig.asymp) FDP = sapply(sigList, function(sig){ length(setdiff(sig, data$index))/max(1, length(sig)) }) TDP = sapply(sigList, function(sig){ length(intersect(sig, data$index))/length(data$index) }) Proceudre = c("SLIP.thresh.c", "SLIP.thresh.d", "SLIP.lasso", "SLIP.indep", "BH.simul", "BH.asymp") res = data.frame(Proceudre, FDP = round(FDP, 4), TDP = round(TDP, 4)) knitr::kable(t(res))
Here we only apply the SLIP.thresh.d()
to the fMRI data for illustrating the usage of the embedded data set.
At first, we read the data:
library(SLIP) data = fmri.data (dimA = c(data$dimx, data$dimy, data$dimz)) # load the ROI data used in the paper (dim(data$dat)) dat = apply(data$dat[-c(1:5, 356:360), ], 2, function(X){ colMeans(matrix(X, nrow = 10)) }) (dim(dat))
We define the FDR nominal level at 0.2:
alpha = 0.2 sigInfo = SLIP.thresh.d(dat, alpha, estMthd = "POET", outputW = TRUE, outputCP = TRUE) # The threshold L (sigInfo$L) # The discovery set (names(sigInfo$sig)) # The estimated FDP (sigInfo$estFDP) # The estimated change-point location (ratio) (sigInfo$cps) # The W-statistics (sigInfo$W)
references: - type: article-journal id: lindquist2007 author: - family: Lindquist given: Martin A. - family: Waugh given: Christian - family: Wager given: Tor D. title: 'Modeling state-related fMRI activity using change-point theory' container-title: NeuroImage issued: date-parts: - - 2007 volume: 35 issue: 3 page: 1125-1141 DOI: 10.1016/j.neuroimage.2007.01.004
family: Hochberg given: Yosef title: 'Controlling the false discovery rate: A practical and powerful approach to multiple testing' container-title: 'Journal of the Royal Statistical Society: Series B (Methodological)' issued: date-parts:
type: article-journal id: WenWangZouWang2021 author:
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.