robdist: Robust distance scrubbing

View source: R/robdist.R

robdistR Documentation

Robust distance scrubbing

Description

Scrubbing with robust distance.

Usage

robdist(
  X,
  RD_cutoff = 4,
  RD_quantile = 0.99,
  trans = c("none", "robust-YJ", "SHASH"),
  bootstrap_n = 1000,
  bootstrap_alpha = 0.01,
  projection = c("ICA", "PCA"),
  nuisance = "DCT4",
  center = TRUE,
  scale = TRUE,
  comps_mean_dt = FALSE,
  comps_var_dt = FALSE,
  PESEL = TRUE,
  kurt_quantile = 0.99,
  get_dirs = FALSE,
  full_PCA = FALSE,
  get_outliers = TRUE,
  cutoff = 4,
  seed = 0,
  ICA_method = c("C", "R"),
  skip_dimred = FALSE,
  verbose = FALSE
)

Arguments

X

Wide numeric data matrix (T observations by V variables, T << V). If X represents an fMRI run, T should be the number of timepoints and V should be the number of vertices/voxels. Projection scrubbing will measure the outlyingness of each row in X.

RD_cutoff

Default: 4.

RD_quantile

Quantile cutoff...?

trans

Apply a transformation prior to univariate outlier detection? Three options: "none" (default), "robust-YJ", and "SHASH".

bootstrap_n

Use bootstrapping to estimate the robust distance null distribution? If so, set this to the number of bootstraps. Default: 100. Use 0 (or FALSE), to use an empirical quantile instead.

bootstrap_alpha

If using bootstrap (bootstrap > 0), this is the level of the bootstrap CI. Default: 0.99.

projection

One of the following: "ICA" (default) or "PCA".

nuisance

Nuisance signals to regress from each column of X. Should be specified as a design matrix: a T by N numeric matrix where N represents the number of nuisance signals. Or can be "DCT4" (default), which will create a matrix with a constant column (the intercept term) and four DCT bases. This default nuisance regression will have the effect of demeaning and detrending the data by removing low-frequency components. To not perform any nuisance regression set this argument to NULL, 0, or FALSE.

Detrending is highly recommended for time-series data, especially if there are many time points or evolving circumstances affecting the data. Additionally, if kurtosis is being used to select the projection directions, trends can induce positive or negative kurtosis, contaminating the connection between high kurtosis and outlier presence. Detrending should not be used with non-time-series data because the observations are not temporally related.

Additional nuisance regressors can be specified like so: cbind(1, fMRItools::dct_bases(nrow(x), 4), more_nuisance).

center, scale

Center the columns of the data by their medians, and scale the columns of the data by their median absolute deviations (MADs)? Default: TRUE. Centering is necessary for computing the projections, so if center is FALSE, the data must already be centered.

Note that centering and scaling occur after nuisance regression, so even if center is FALSE, the data will be centered on the means if the nuisance regression included an intercept term, as it does by default.

comps_mean_dt, comps_var_dt

Stabilize the mean and variance of each projection component's timecourse prior to computing kurtosis and leverage? These arguments should be TRUE, FALSE (default), or the number of DCT bases to use for detrending (TRUE will use 4). Note that these arguments affect the projection components and not the data itself. Also, if variance-stabilizing but not mean-stabilizing, the components must already be expected to be mean-stabilized, for example if the data was rigorously detrended; otherwise, the results will be invalid.

Slow-moving mean and variance patterns in the components will interfere with the roles of kurtosis and leverage in identifying outliers. While nuisance can be used to detrend the data, this nuisance regression is estimated non-robustly, since a robust model takes too long to estimate at each data location. On the other hand, comps_mean_dt and comps_var_dt can be used to apply a robust nuisance regression at each component, since there are much fewer components than original data locations. Thus, even if the data has been detrended with nuisance it may be helpful to detrend the components with comps_mean_dt; furthermore, the data nuisance regression does not address the potential existence of variance patterns in the components.

Overall, for fMRI we recommend enabling comps_mean_dt and comps_var_dt unless the data has been cleaned not only with a low-pass filter like DCT nuisance regression, but also with anatomical CompCor, ICA-FIX, or a similar data-driven strategy that takes into account common sources of artifactual mean and variance trends such as motion and physiological cycles.

PESEL

Use pesel to select the number of components? Default: TRUE. Otherwise, use the number of principal components with above-average variance.

kurt_quantile

What quantile cutoff should be used to select the components? Default: 0.99. Use 0 to select all high-variance components regardless of kurtosis value.

We model each component as a length T vector of Normal iid random variables, for which the distribution of kurtosis values can be approximated. The quantile is estimated based on this distribution.

get_dirs

Should the projection directions be returned? This is the V matrix in PCA and S matrix in ICA. The default is FALSE to save memory. However, get_dirs==TRUE is required for artifact_images.

full_PCA

Only applies to the PCA projection. Return the full SVD? Default: FALSE (return only the high-variance components).

get_outliers

Should outliers be flagged based on cutoff? Default: TRUE.

cutoff

Median leverage cutoff value. Default: 4.

seed

Set a seed right before the call to pesel::pesel or ica::icaimax? If NULL, do not set a seed. If numeric (default: 0), will use as the seed.

ICA_method

The method argument to fastICA: "C" to use C code with BLAS (default), or "R" to use R code.

skip_dimred

Skip dimension reduction? Default: FALSE.

verbose

Should occasional updates be printed? Default: FALSE.

Value

A "robdist" object, i.e. a list with components

lwr_50

...

lwr_80

...

B_quant

...

Examples

library(fastICA)
rdx = robdist(Dat1[seq(70),seq(800,950)])

fMRIscrub documentation built on Oct. 25, 2023, 9:07 a.m.