autoFMradio: Wrapper for automated workflow
In FMradio: Factor Modeling for Radiomics Data

Description Usage Arguments Details Value Note Author(s) References See Also Examples

autoFMradio is a wrapper function that automates the three main steps of the FMradio workflow.

1 2	autoFMradio(X, t = .95, fold = 5, GB = 1, type = "thomson", verbose = TRUE, printInfo = TRUE, seed = NULL)

`X`	A data `matrix` or an `ExpressionSet` object.
`t`	A scalar `numeric` indicating the absolute value for thresholding.
`fold`	A `numeric` integer or `integer` indicating the number of folds to use in cross-validation.
`GB`	A `numeric` integer or `integer` indicating which Guttman bound to use for determining the number of latent features to retain. Must be either 1, 2, or 3.
`type`	A `character` indicating the type of factor score to calculate. Must be one of: "thomson", "bartlett", "anderson".
`verbose`	A `logical` indicating if function should run silently. Runs silently when `verbose = FALSE`.
`printInfo`	A `logical` indicating if additional information should be printed on-screen. Suppresses printing when `verbose = FALSE`.
`seed`	A `numeric` integer or `integer` indicating the seed for the random number generator.

The autoFMradio function automates the three main steps of the workflow by providing a wrapper around all core functions.

Step 1 (regularized correlation matrix estimation) is performed using the X, t, and fold arguments. The raw correlation matrix based on data X is redundancy-filtered using the threshold provided in t. Subsequently, a regularized estimate of the correlation matrix (on the possibly filtered feature set) is computed with the optimal penalty value determined by cross-validation. The number of folds is set by the fold argument. For more information on Step 1 see RF, subSet, and regcor.

Step 2 (factor analytic data compression) is performed using the GB argument. With this argument one can use either the first, second, or third Guttman bound to select the intrinsic dimensionality of the latent vector. This bound, together with the regularized correlation matrix, is used in a maximum likelihood factor analysis with simple-structure rotation. For more information on Step 2, see dimGB and mlFA.

Step 3 (obtaining factor scores) is performed using the type argument. It determines factor scores: the score each object/individual would obtain on each of the latent factors. The type argument determines the type of factor score that is calculated. For more information on Step 3, see facScore.

When printInfo = TRUE additional information is printed on-screen after the full procedure has run its course. This additional information pertains to each of the steps mentioned above. For Step 1 it reiterates the thresholding value for redundancy filtering and gives the number of features retained after this filtering. It also reiterates the number of folds used in determining the optimal penalty value as well as this value itself. Moreover, it provides the value of the Kaiser-Meyer-Olkin index on the optimal regularized correlation matrix estimate (see SA). For Step 2 it reiterates which Guttman bound was used in determining the number of latent factors as well as the number of latent factors retained. It also gives the proportion of explained variance under the factor solution of the chosen latent dimension (see dimVAR). For step 3 it reiterates the type of factor score that was calculated. Also, it prints the lowest ‘determinacy score’ amongst the latent factors (see facSMC).

The factor scores in the $Scores slot of the output (see below) can be directly used as input features in any prediction or classification procedure. In case of external (rather than internal) validation one can use the parameter matrices in the $Loadings and $Uniqueness slots in combination with fresh data to provide a validation factor projection based on the training solution. See Peeters et al. (2019).

The function returns an object of class list:

`$Scores`	An object of class `data.frame` containing the factor scores. Observations are represented in the rows. Each column represent a latent factor.
`$FilteredData`	Subsetted data `matrix` containing only those features retained after redundancy filtering.
`$FilteredCor`	A correlation `matrix` based on the data in the `$FilteredData` slot.
`$optPen`	A `numeric` scalar representing the optimal value for the penalty parameter.
`$optCor`	A `matrix` representing the regularized correlation matrix under the optimal penalty-value.
`$m`	An `integer` correspond to number of latent factors retained under the chosen Guttman bound.
`$Loadings`	A matrix of class `loadings` representing the loadings matrix in which in which each element λ_{jk} is the loading of the jth feature on the kth latent factor.
`$Uniqueness`	A `matrix` representing the diagonal matrix carrying the unique variances.
`$Exvariance`	A `numeric` vector representing the cumulative variance for each respective latent feature.
`$determinacy`	A `numeric` vector indicating, for each factor, the squared multiple correlation between the observed features and the common latent factor.
`$used.seed`	A `numeric` or `integer` used as the starting seed in random number generation.

When seed = NULL the starting seed is determined by drawing a single integer from the integers 1:9e5. This non-user-supplied seed is also found in the $used.seed slot of the output.

Carel F.W. Peeters <cf.peeters@vumc.nl>

Peeters, C.F.W. et al. (2019). Stable prediction with radiomics data. arXiv:1903.11696 [stat.ML].

RF, subSet, regcor, dimGB, mlFA, facScore

## Simulate some data according to a factor model with 3 latent factors
simDAT <- FAsim(p = 24, m = 3, n = 40, loadingvalue = .9)
X <- simDAT$data

## Perform the lot
FullMonty <- autoFMradio(X, GB = 1, seed = 303)