FMradio-package: Factor modeling for radiomic data

Description Details Author(s) References

Description

The FMradio package provides a workflow that uses factor modeling to project the high-dimensional and collinear radiomic feature-space onto a lower-dimensional orthogonal meta-feature space that retains most of the information contained in the full data set. These projected meta-features can be directly used as robust and stable covariates in any downstream prediction or classification model.

Details

Radiomics refers to the mining of large numbers of quantitative features from standard-of-care clinical images. FMradio aims to provide support for stable prediction and classification modeling with radiomics data, irrespective of imaging modality (such as MRI, PET, or CT). The workflow has 3 main steps that ultimately enable stable prediction and classification.

Step 1: Regularized correlation matrix estimation. Radiomic data are often high-dimensional in the sense that there are more features than observations. Moroever, radiomic data are often highly collinear, in the sense that collections of features may be highly correlated (in the absolute sense). This results in the correlation matrix on the radiomic features to be ill-conditioned or even singular. It is also this combination of characteristics that proves difficult to predictive modeling. As the factor-analytic procedure is based on the modeling of moment structures such as the correlation matrix, the first step is to obtain a regularized, well-conditioned estimate of the correlation matrix. The following functions are then of use:

The radioHeat function can be used to visualize (a possibly regularized) correlation matrix as a heatmap. It can also be used to visually assess feature-redundancy. The RF function provides functionality for filtering features that are so collinear that they are deemed redundant. The suBSet function provides functionality to subset data objects to those features retained after possible filtering. The regcor function subsequently provides a regularized estimate of the correlation matrix (on the possibly filtered feature set).

Step 2: Factor analytic data compression. The next step would be to project the collinear and high-dimensional radiomic feature-space onto a lower-dimensional orthogonal meta-feature space. Factor analysis can be used for this purpose. The following functions are then of use:

The SA function assesses if performing a factor analysis on the (possibly regularized) correlation matrix would be appropriate. The dimGB function can be used to determine the number of latent factors (i.e., to determine the intrinsic dimensionality of the meta-feature space). The dimVAR and dimSMC functions can be used to provide additional decision support with respect to the output of the dimGB function. The mlFA function then performs a maximum likelihood factor analysis using the (possibly regularized) correlation matrix and the choice of intrinsic dimensionality as inputs.

Step 3: Obtaining factor scores. The third step would be to use the factor analytic solution to obtain factor scores: the score each object/individual would obtain on each of the latent factors. The following functions are then of use:

The facScore function provides several options for computing factors scores. The determinacy of these scores can be assessed with the facSMC function.

Step 4: Prediction and classification. The factor scores obtained with Step 3 can be directly used as (low-dimensional and orthogonal) covariates in any prediction, classification or learning procedure. One may use the full flexibility provided by the CRAN repository for this step.

Additional functionality. The package also provides additional functionality. These are contained in the following (convenience) functions:

The dimLRT and dimIC functions provide alternative options for assessing the number of latent factors using likelihood ratio testing and information criteria, respectively. These are only recommended when the sample size is large relative to the number of features. FAsim provides a flexible function for generating data according to the orthogonal common factor analytic model. All these functions may be of use in comparative exercises. The package also provides a wrapper function that automates the 3 main steps of the workflow:

Author(s)

Carel F.W. Peeters [cre, aut]
Caroline Ubelhor [ctb]
Kevin Kunzmann [ctb]

Maintainer: Carel F.W. Peeters <cf.peeters@vumc.nl>

References

Peeters, C.F.W. et al. (2019). Stable prediction with radiomics data. arXiv:1903.11696 [stat.ML].


FMradio documentation built on Dec. 16, 2019, 5:43 p.m.