R/crossValidate.R

Defines functions predict.trainedByClassifyR train.MultiAssayExperiment train.list train.DataFrame train.data.frame train.matrix simplifyResults CV generateMultiviewParams generateModellingParams generateCrossValParams cleanClassifier cleanSelectionMethod cleanNFeatures

Documented in predict.trainedByClassifyR train.data.frame train.DataFrame train.list train.matrix train.MultiAssayExperiment

#' Cross-validation to evaluate classification performance.
#' 
#' This function has been designed to facilitate the comparison of classification
#' methods using cross-validation, particularly when there are multiple assays per biological unit.
#' A selection of typical comparisons are implemented. The \code{train} function
#' is a convenience method for training on one data set and likewise \code{predict} for predicting on an
#' independent validation data set.
#'
#' @param measurements Either a \code{\link{DataFrame}}, \code{\link{data.frame}}, \code{\link{matrix}}, \code{\link{MultiAssayExperiment}} 
#' or a list of the basic tabular objects containing the data.
#' @param x Same as \code{measurements} but only training samples.
#' @param outcome A vector of class labels of class \code{\link{factor}} of the
#' same length as the number of samples in \code{measurements} or a character vector of length 1 containing the
#' column name in \code{measurements} if it is a \code{\link{DataFrame}}. Or a \code{\link{Surv}} object or a character vector of
#' length 2 or 3 specifying the time and event columns in \code{measurements} for survival outcome. If \code{measurements} is a
#' \code{\link{MultiAssayExperiment}}, the column name(s) in \code{colData(measurements)} representing the outcome.  If column names
#' of survival information, time must be in first column and event status in the second.
#' @param outcomeTrain For the \code{train} function, either a factor vector of classes, a \code{\link{Surv}} object, or
#' a character string, or vector of such strings, containing column name(s) of column(s)
#' containing either classes or time and event information about survival. If column names
#' of survival information, time must be in first column and event status in the second.
#' @param extraParams A list of parameters that will be used to overwrite default settings of transformation, selection, or model-building functions or
#' parameters which will be passed into the data cleaning function. The names of the list must be one of \code{"prepare"},
#' \code{"select"}, \code{"train"}, \code{"predict"}. To remove one of the defaults (see the article titled Parameter Tuning Presets for crossValidate and Their Customisation on
#' the website), specify the list element to be \code{NULL}. For the valid element names in the \code{"prepare"} list, see \code{?prepareData}.
#' @param nFeatures The number of features to be used for classification. If this is a single number, the same number of features will be used for all comparisons
#' or assays. If a numeric vector these will be optimised over using \code{selectionOptimisation}. If a named vector with the same names of multiple assays, 
#' a different number of features will be used for each assay. If a named list of vectors, the respective number of features will be optimised over. 
#' Set to NULL or "all" if all features should be used.
#' @param selectionMethod Default: \code{"auto"}. A character vector of feature selection methods to compare. If a named character vector with names corresponding to different assays, 
#' and performing multiview classification, the respective selection methods will be used on each assay. If \code{"auto"}, t-test (two categories) / F-test (three or more categories) ranking
#' and top \code{nFeatures} optimisation is done. Otherwise, the ranking method is per-feature Cox proportional hazards p-value. \code{"none"} is also a valid value, meaning that no
#' indepedent feature selection will be performed (but implicit selection might still happen with the classifier).
#' @param selectionOptimisation A character of "Resubstitution", "Nested CV" or "none" specifying the approach used to optimise \code{nFeatures}.
#' @param performanceType Default: \code{"auto"}. If \code{"auto"}, then balanced accuracy for classification or C-index for survival. Otherwise, any one of the
#' options described in \code{\link{calcPerformance}} may otherwise be specified.
#' @param classifier Default: \code{"auto"}. A character vector of classification methods to compare. If a named character vector with names corresponding to different assays, 
#' and performing multiview classification, the respective classification methods will be used on each assay. If \code{"auto"}, then a random forest is used for a classification
#' task or Cox proportional hazards model for a survival task.
#' @param multiViewMethod Default: \code{"none"}. A character vector specifying the multiview method or data integration approach to use. See \code{available("multiViewMethod") for possibilities.}
#' @param assayCombinations A character vector or list of character vectors proposing the assays or, in the case of a list, combination of assays to use
#' with each element being a vector of assays to combine. Special value \code{"all"} means all possible subsets of assays.
#' @param nFolds A numeric specifying the number of folds to use for cross-validation.
#' @param nRepeats A numeric specifying the the number of repeats or permutations to use for cross-validation.
#' @param nCores A numeric specifying the number of cores used if the user wants to use parallelisation. 
#' @param characteristicsLabel A character specifying an additional label for the cross-validation run.
#' @param ... For \code{train} and \code{predict} functions, parameters not used by the non-DataFrame signature functions but passed into the DataFrame signature function.
#' @param object A trained model to predict with.
#' @param newData The data to use to make predictions with.
#' @param verbose Default: 0. A number between 0 and 3 for the amount of
#' progress messages to give.  A higher number will produce more messages as
#' more lower-level functions print messages.
#'
#' @details
#' \code{classifier} can be any a keyword for any of the implemented approaches as shown by \code{available()}.
#' \code{selectionMethod} can be a keyword for any of the implemented approaches as shown by \code{available("selectionMethod")}.
#' \code{multiViewMethod} can be a keyword for any of the implemented approaches as shown by \code{available("multiViewMethod")}.
#'
#' @return An object of class \code{\link{ClassifyResult}}
#' @export
#' @aliases crossValidate crossValidate,matrix-method crossValidate,DataFrame-method
#' crossValidate,MultiAssayExperiment-method, crossValidate,data.frame-method
#' @rdname crossValidate
#'
#' @examples
#' 
#' data(asthma)
#' 
#' # Compare randomForest and SVM classifiers.
#' result <- crossValidate(measurements, classes, classifier = c("randomForest", "SVM"))
#' performancePlot(result)
#' 
#' 
#' # Compare performance of different assays. 
#' # First make a toy example assay with multiple data types. We'll randomly assign different features to be clinical, gene or protein.
#' # set.seed(51773)
#' # measurements <- DataFrame(measurements, check.names = FALSE)
#' # mcols(measurements)$assay <- c(rep("clinical",20),sample(c("gene", "protein"), ncol(measurements)-20, replace = TRUE))
#' # mcols(measurements)$feature <- colnames(measurements)
#' 
#' # We'll use different nFeatures for each assay. We'll also use repeated cross-validation with 5 repeats for speed in the example.
#' # set.seed(51773)
#' #result <- crossValidate(measurements, classes, nFeatures = c(clinical = 5, gene = 20, protein = 30), classifier = "randomForest", nRepeats = 5)
#' # performancePlot(result)
#' 
#' # Merge different assays. But we will only do this for two combinations. If assayCombinations is not specified it would attempt all combinations.
#' # set.seed(51773)
#' # resultMerge <- crossValidate(measurements, classes, assayCombinations = list(c("clinical", "protein"), c("clinical", "gene")), multiViewMethod = "merge", nRepeats = 5)
#' # performancePlot(resultMerge)
#' 
#' 
#' # performancePlot(c(result, resultMerge))
#' 
#' @importFrom survival Surv
#' @usage NULL
setGeneric("crossValidate", function(measurements, outcome, ...)
    standardGeneric("crossValidate"))

#' @rdname crossValidate
#' @export
setMethod("crossValidate", "DataFrame",
          function(measurements,
                   outcome,
                   nFeatures = 20,
                   selectionMethod = "auto",
                   selectionOptimisation = "Resubstitution",
                   performanceType = "auto",
                   classifier = "auto",
                   multiViewMethod = "none",
                   assayCombinations = "all",
                   nFolds = 5,
                   nRepeats = 20,
                   nCores = 1,
                   characteristicsLabel = NULL, extraParams = NULL, verbose = 0)

          {
              # Check that data is in the right format, if not already done for MultiAssayExperiment input.
              if(!"assay" %in% colnames(S4Vectors::mcols(measurements))) # Assay is put there by prepareData for MultiAssayExperiment, skip if present. 
              {
                prepParams <- list(measurements, outcome)
                if("prepare" %in% names(extraParams))
                  prepParams <- c(prepParams, extraParams[["prepare"]])
                measurementsAndOutcome <- do.call(prepareData, prepParams)
                measurements <- measurementsAndOutcome[["measurements"]]
                outcome <- measurementsAndOutcome[["outcome"]]
              }
              
              # Ensure performance type is one of the ones that can be calculated by the package.
              if(!performanceType %in% c("auto", .ClassifyRenvir[["performanceTypes"]]))
                stop(paste("performanceType must be one of", paste(c("auto", .ClassifyRenvir[["performanceTypes"]]), collapse = ", "), "but is", performanceType))
              
              isCategorical <- is.character(outcome) && (length(outcome) == 1 || length(outcome) == nrow(measurements)) || is.factor(outcome)
              if(performanceType == "auto")
                if(isCategorical) performanceType <- "Balanced Accuracy" else performanceType <- "C-index"
              if(length(selectionMethod) == 1 && selectionMethod == "auto")
                if(isCategorical) selectionMethod <- "t-test" else selectionMethod <- "CoxPH"
              if(length(classifier) == 1 && classifier == "auto")
                if(isCategorical) classifier <- "randomForest" else classifier <- "CoxPH"
              
              
              # Which data-types or data-views are present?
              assayIDs <- unique(S4Vectors::mcols(measurements)$assay)
              if(is.null(assayIDs)) assayIDs <- 1

              # Check that other variables are in the right format and fix
              nFeatures <- cleanNFeatures(nFeatures = nFeatures,
                                          measurements = measurements)
              selectionMethod <- cleanSelectionMethod(selectionMethod = selectionMethod,
                                                      measurements = measurements)
              classifier <- cleanClassifier(classifier = classifier,
                                            measurements = measurements, nFeatures = nFeatures)
              
              ##!!!!! Do something with data combinations

              ################################
              #### No multiview
              ################################

              if(multiViewMethod == "none"){

                  # The below loops over assay and classifier and allows us to answer
                  # the following questions:
                  #
                  # 1) One assay using one classifier
                  # 2) One assay using multi classifiers
                  # 3) Multi assays individually
                  
                  # We should probably transition this to use grid instead.
                  resClassifier <-
                      sapply(assayIDs, function(assayIndex) {
                          # Loop over assays
                          sapply(classifier[[assayIndex]], function(classifierForAssay) {
                              # Loop over classifiers
                              sapply(selectionMethod[[assayIndex]], function(selectionForAssay) {
                                  # Loop over selectors
                                  measurementsUse <- measurements
                                  if(assayIndex != 1) measurementsUse <- measurements[, S4Vectors::mcols(measurements)[, "assay"] == assayIndex, drop = FALSE]
                                  CV(
                                      measurements = measurementsUse, outcome = outcome,
                                      assayIDs = assayIndex,
                                      nFeatures = nFeatures[assayIndex],
                                      selectionMethod = selectionForAssay,
                                      selectionOptimisation = selectionOptimisation,
                                      performanceType = performanceType,
                                      classifier = classifierForAssay,
                                      multiViewMethod = multiViewMethod,
                                      nFolds = nFolds,
                                      nRepeats = nRepeats,
                                      nCores = nCores,
                                      characteristicsLabel = characteristicsLabel,
                                      extraParams = extraParams, verbose = verbose
                                  )
                              },
                              simplify = FALSE)
                          },
                          simplify = FALSE)
                      },
                      simplify = FALSE)
                  result <- unlist(unlist(resClassifier))
              }

              ################################
              #### Yes multiview
              ################################

              ### Merging or binding to combine data
              if(multiViewMethod == "merge"){


                  # The below loops over different combinations of assays and merges them together.
                  # This allows someone to answer which combinations of the assays might be most useful.

                  if(!is.list(assayCombinations) && assayCombinations[1] == "all") assayCombinations <- do.call("c", sapply(seq_along(assayIDs), function(nChoose) combn(assayIDs, nChoose, simplify = FALSE)))

                  result <- sapply(assayCombinations, function(assayIndex){
                      CV(measurements = measurements[, S4Vectors::mcols(measurements)[["assay"]] %in% assayIndex, drop = FALSE],
                         outcome = outcome, assayIDs = assayIndex,
                         nFeatures = nFeatures[assayIndex],
                         selectionMethod = selectionMethod[assayIndex],
                         selectionOptimisation = selectionOptimisation,
                         performanceType = performanceType, 
                         classifier = classifier[assayIndex],
                         multiViewMethod = ifelse(length(assayIndex) == 1, "none", multiViewMethod),
                         nFolds = nFolds,
                         nRepeats = nRepeats,
                         nCores = nCores,
                         characteristicsLabel = characteristicsLabel,
                         extraParams = extraParams, verbose = verbose)
                  }, simplify = FALSE)

              }


              ### Prevalidation to combine data
              if(multiViewMethod == "prevalidation"){


                  # The below loops over different combinations of assays and combines them together using prevalidation.
                  # This allows someone to answer which combinations of the assays might be most useful.


                  if(!is.list(assayCombinations) && assayCombinations[1] == "all")
                  {
                      assayCombinations <- do.call("c", sapply(seq_along(assayIDs), function(nChoose) combn(assayIDs, nChoose, simplify = FALSE)))
                      assayCombinations <- assayCombinations[sapply(assayCombinations, function(combination) "clinical" %in% combination, simplify = TRUE)]
                      if(length(assayCombinations) == 0) stop("No assayCombinations with \"clinical\" data")
                  }

                  result <- sapply(assayCombinations, function(assayIndex){
                      CV(measurements = measurements[, S4Vectors::mcols(measurements)[["assay"]] %in% assayIndex, drop = FALSE],
                         outcome = outcome, assayIDs = assayIndex,
                         nFeatures = nFeatures[assayIndex],
                         selectionMethod = selectionMethod[assayIndex],
                         selectionOptimisation = selectionOptimisation,
                         performanceType = performanceType,
                         classifier = classifier[assayIndex],
                         multiViewMethod = ifelse(length(assayIndex) == 1, "none", multiViewMethod),
                         nFolds = nFolds,
                         nRepeats = nRepeats,
                         nCores = nCores,
                         characteristicsLabel = characteristicsLabel,
                         extraParams = extraParams, verbose = verbose)
                  }, simplify = FALSE)

              }



              ### Principal Components Analysis to combine data
              if(multiViewMethod == "PCA"){


                  # The below loops over different combinations of assays and combines them together using prevalidation.
                  # This allows someone to answer which combinations of the assays might be most useful.


                  if(!is.list(assayCombinations) && assayCombinations[1] == "all"){
                      assayCombinations <- do.call("c", sapply(seq_along(assayIDs),function(nChoose) combn(assayIDs, nChoose, simplify = FALSE)))
                      assayCombinations <- assayCombinations[sapply(assayCombinations, function(combination) "clinical" %in% combination, simplify = TRUE)]
                      if(length(assayCombinations) == 0) stop("No assayCombinations with \"clinical\" data")
                  }


                  result <- sapply(assayCombinations, function(assayIndex){
                      CV(measurements = measurements[, S4Vectors::mcols(measurements)$assay %in% assayIndex, drop = FALSE],
                         outcome = outcome, assayIDs = assayIndex,
                         nFeatures = nFeatures[assayIndex],
                         selectionMethod = selectionMethod[assayIndex],
                         selectionOptimisation = selectionOptimisation,
                         performanceType = performanceType,
                         classifier = classifier[assayIndex],
                         multiViewMethod = ifelse(length(assayIndex) == 1, "none", multiViewMethod),
                         nFolds = nFolds,
                         nRepeats = nRepeats,
                         nCores = nCores,
                         characteristicsLabel = characteristicsLabel,
                         extraParams = extraParams, verbose = verbose)
                  }, simplify = FALSE)

              }
              if(length(result) == 1) result <- result[[1]]
              result

          })


#' @rdname crossValidate
#' @export
# One or more omics data sets, possibly with clinical data.
setMethod("crossValidate", "MultiAssayExperimentOrList",
          function(measurements,
                   outcome,
                   nFeatures = 20,
                   selectionMethod = "auto",
                   selectionOptimisation = "Resubstitution",
                   performanceType = "auto",
                   classifier = "auto",
                   multiViewMethod = "none",
                   assayCombinations = "all",
                   nFolds = 5,
                   nRepeats = 20,
                   nCores = 1,
                   characteristicsLabel = NULL, extraParams = NULL, verbose = 0)
          {
              # Check that data is in the right format, if not already done for MultiAssayExperiment input.
              prepParams <- list(measurements, outcome)
              if("prepare" %in% names(extraParams))
                prepParams <- c(prepParams, extraParams[["prepare"]])
              measurementsAndOutcome <- do.call(prepareData, prepParams)

              crossValidate(measurements = measurementsAndOutcome[["measurements"]],
                            outcome = measurementsAndOutcome[["outcome"]], 
                            nFeatures = nFeatures,
                            selectionMethod = selectionMethod,
                            selectionOptimisation = selectionOptimisation,
                            performanceType = performanceType,
                            classifier = classifier,
                            multiViewMethod = multiViewMethod,
                            assayCombinations = assayCombinations,
                            nFolds = nFolds,
                            nRepeats = nRepeats,
                            nCores = nCores,
                            characteristicsLabel = characteristicsLabel,
                            extraParams = extraParams)
          })

#' @rdname crossValidate
#' @export
setMethod("crossValidate", "data.frame", # data.frame of numeric measurements.
          function(measurements,
                   outcome, 
                   nFeatures = 20,
                   selectionMethod = "auto",
                   selectionOptimisation = "Resubstitution",
                   performanceType = "auto",
                   classifier = "auto",
                   multiViewMethod = "none",
                   assayCombinations = "all",
                   nFolds = 5,
                   nRepeats = 20,
                   nCores = 1,
                   characteristicsLabel = NULL, extraParams = NULL, verbose = 0)
          {
              measurements <- S4Vectors::DataFrame(measurements, check.names = FALSE)
              crossValidate(measurements = measurements,
                            outcome = outcome,
                            nFeatures = nFeatures,
                            selectionMethod = selectionMethod,
                            selectionOptimisation = selectionOptimisation,
                            performanceType = performanceType,
                            classifier = classifier,
                            multiViewMethod = multiViewMethod,
                            assayCombinations = assayCombinations,
                            nFolds = nFolds,
                            nRepeats = nRepeats,
                            nCores = nCores,
                            characteristicsLabel = characteristicsLabel, extraParams = extraParams)
          })

#' @rdname crossValidate
#' @export
setMethod("crossValidate", "matrix", # Matrix of numeric measurements.
          function(measurements,
                   outcome,
                   nFeatures = 20,
                   selectionMethod = "auto",
                   selectionOptimisation = "Resubstitution",
                   performanceType = "auto",
                   classifier = "auto",
                   multiViewMethod = "none",
                   assayCombinations = "all",
                   nFolds = 5,
                   nRepeats = 20,
                   nCores = 1,
                   characteristicsLabel = NULL, extraParams = NULL, verbose = 0)
          {
              measurements <- S4Vectors::DataFrame(measurements, check.names = FALSE)
              crossValidate(measurements = measurements,
                            outcome = outcome,
                            nFeatures = nFeatures,
                            selectionMethod = selectionMethod,
                            selectionOptimisation = selectionOptimisation,
                            performanceType = performanceType,
                            classifier = classifier,
                            multiViewMethod = multiViewMethod,
                            assayCombinations = assayCombinations,
                            nFolds = nFolds,
                            nRepeats = nRepeats,
                            nCores = nCores,
                            characteristicsLabel = characteristicsLabel, extraParams = extraParams)
          })

######################################
######################################
cleanNFeatures <- function(nFeatures, measurements){
    #### Clean up
    if(!is.null(S4Vectors::mcols(measurements)$assay))
      obsFeatures <- unlist(as.list(table(S4Vectors::mcols(measurements)[, "assay"])))
    else obsFeatures <- ncol(measurements)
    if(is.null(nFeatures) || length(nFeatures) == 1 && nFeatures == "all") nFeatures <- as.list(obsFeatures)
    if(is.null(names(nFeatures)) && length(nFeatures) == 1) nFeatures <- as.list(pmin(obsFeatures, nFeatures))
    if(is.null(names(nFeatures)) && length(nFeatures) > 1) nFeatures <- sapply(obsFeatures, function(x)pmin(x, nFeatures), simplify = FALSE)
    #if(is.null(names(nFeatures)) && length(nFeatures) > 1) stop("nFeatures needs to be a named numeric vector or list with the same names as the assays.")
    if(!is.null(names(obsFeatures)) && !all(names(obsFeatures) %in% names(nFeatures))) stop("nFeatures needs to be a named numeric vector or list with the same names as the assays.")
    if(!is.null(names(obsFeatures)) && all(names(obsFeatures) %in% names(nFeatures)) & is(nFeatures, "numeric")) nFeatures <- as.list(pmin(obsFeatures, nFeatures[names(obsFeatures)]))
    if(!is.null(names(obsFeatures)) && all(names(obsFeatures) %in% names(nFeatures)) & is(nFeatures, "list")) nFeatures <- mapply(pmin, nFeatures[names(obsFeatures)], obsFeatures, SIMPLIFY = FALSE)
    nFeatures
}

######################################
######################################
cleanSelectionMethod <- function(selectionMethod, measurements){
    #### Clean up
    if(!is.null(S4Vectors::mcols(measurements)$assay))
      obsFeatures <- unlist(as.list(table(S4Vectors::mcols(measurements)[, "assay"])))
    else return(list(selectionMethod))

    if(is.null(names(selectionMethod)) & length(selectionMethod) == 1 & !is.null(names(obsFeatures))) selectionMethod <- sapply(names(obsFeatures), function(x) selectionMethod, simplify = FALSE)
    if(is.null(names(selectionMethod)) & length(selectionMethod) > 1 & !is.null(names(obsFeatures))) selectionMethod <- sapply(names(obsFeatures), function(x) selectionMethod, simplify = FALSE)
    #if(is.null(names(selectionMethod)) & length(selectionMethod) > 1) stop("selectionMethod needs to be a named character vector or list with the same names as the assays.")
    if(!is.null(names(obsFeatures)) && !all(names(obsFeatures) %in% names(selectionMethod))) stop("selectionMethod needs to be a named character vector or list with the same names as the assays.")
    if(!is.null(names(obsFeatures)) && all(names(obsFeatures) %in% names(selectionMethod)) & is(selectionMethod, "character")) selectionMethod <- as.list(selectionMethod[names(obsFeatures)])
    selectionMethod
}

######################################
######################################
cleanClassifier <- function(classifier, measurements, nFeatures){
    #### Clean up
    if(!is.null(S4Vectors::mcols(measurements)$assay))
      obsFeatures <- unlist(as.list(table(S4Vectors::mcols(measurements)[, "assay"])))
    else return(list(classifier))

    if(is.null(names(classifier)) & length(classifier) == 1 & !is.null(names(obsFeatures))) classifier <- sapply(names(obsFeatures), function(x)classifier, simplify = FALSE)
    if(is.null(names(classifier)) & length(classifier) > 1 & !is.null(names(obsFeatures))) classifier <- sapply(names(obsFeatures), function(x)classifier, simplify = FALSE)
    #if(is.null(names(classifier)) & length(classifier) > 1) stop("classifier needs to be a named character vector or list with the same names as the assays.")
    if(!is.null(names(obsFeatures)) && !all(names(obsFeatures) %in% names(classifier))) stop("classifier needs to be a named character vector or list with the same names as the assays.")
    if(!is.null(names(obsFeatures)) && all(names(obsFeatures) %in% names(classifier)) & is(classifier, "character")) classifier <- as.list(classifier[names(obsFeatures)])
    
    nFeatures <- nFeatures[names(classifier)]
    checkENs <- which(classifier %in% c("ridgeGLM", "elasticNetGLM", "LASSOGLM"))
    if(length(checkENs) > 0)
    {
      replacements <- sapply(checkENs, function(checkEN) ifelse(any(nFeatures[[checkEN]] == 1), "GLM", classifier[[checkEN]]))
      classifier[checkENs] <- replacements
      if(any(replacements == "GLM"))
        warning("Penalised GLM requires two or more features as input but there is only one.
Using an ordinary GLM instead.")
    }
    classifier
}

generateCrossValParams <- function(nRepeats, nFolds, nCores, selectionOptimisation){

    if(!exists(".Random.seed")) stop("Predictive modelling should always be reproducible. Please use set.seed(<number>) yourself and run 'crossValidate' again.")
    index <- ifelse(.Random.seed[2] + 2 == length(.Random.seed), 3, 3 + .Random.seed[2]) # Right after set.seed, the second number is the length of the random integer vector.
    seed <- .Random.seed[index] # Get current random number.
    
    # Set the BPparam RNGseed.
    if(nCores == 1)
    {
        BPparam <- SerialParam(RNGseed = seed)
    } else { # Parallel processing is desired.
        
        if(Sys.info()["sysname"] == "Windows") {# Only SnowParam suits Windows.
            BPparam <- BiocParallel::SnowParam(min(nCores, BiocParallel::snowWorkers("SOCK")), RNGseed = seed)
        } else if (Sys.info()["sysname"] %in% c("MacOS", "Linux")) {
            BPparam <- BiocParallel::MulticoreParam(min(nCores, BiocParallel::multicoreWorkers()), RNGseed = seed) # Multicore is faster than SNOW, but it doesn't work on Windows.
        } else { # Something weird.
            BPparam <- BiocParallel::bpparam() # BiocParallel will figure it out.
        }
    }
    tuneMode <- selectionOptimisation
    if(tuneMode == "CV") tuneMode <- "Nested CV"
    if(!any(tuneMode %in% c("Resubstitution", "Nested CV", "none"))) stop("selectionOptimisation must be Nested CV or Resubstitution or none")
    CrossValParams(permutations = nRepeats, folds = nFolds, parallelParams = BPparam, tuneMode = tuneMode)
}


# Returns a single parameter set.
generateModellingParams <- function(assayIDs,
                                    measurements,
                                    nFeatures,
                                    selectionMethod,
                                    selectionOptimisation,
                                    performanceType = "auto",
                                    classifier,
                                    multiViewMethod = "none",
                                    extraParams
){
    if(multiViewMethod != "none") {
        params <- generateMultiviewParams(assayIDs,
                                          measurements,
                                          nFeatures,
                                          selectionMethod,
                                          selectionOptimisation,
                                          performanceType,
                                          classifier,
                                          multiViewMethod, extraParams)
        return(params)
    }


    if(length(assayIDs) > 1) obsFeatures <- sum(S4Vectors::mcols(measurements)[, "assay"] %in% assayIDs)
    else obsFeatures <- ncol(measurements)


    nFeatures <- unlist(nFeatures)

    if(max(nFeatures) > obsFeatures) {

        warning("nFeatures greater than the max number of features in data. Setting to max")
        nFeatures <- pmin(nFeatures, obsFeatures)
    }

    classifier <- unlist(classifier)
    
    # Check classifier
    knownClassifiers <- .ClassifyRenvir[["classifyKeywords"]][, "classifier Keyword"]
    if(any(!classifier %in% knownClassifiers))
        stop(paste("classifier must exactly match these options (be careful of case):", paste(knownClassifiers, collapse = ", ")))
    
    # Always return a list for ease of processing. Unbox at end if just one.
    classifierParams <- .classifierKeywordToParams(classifier)

    # Modify the parameters with performanceType addition and any other to overwrite.
    if(!is.null(classifierParams$trainParams@tuneParams))
      classifierParams$trainParams@tuneParams <- c(classifierParams$trainParams@tuneParams, performanceType = performanceType)

    if(!is.null(extraParams) && "train" %in% names(extraParams))
    {
      for(paramIndex in seq_along(extraParams[["train"]]))
      {
        parameter <- extraParams[["train"]][[paramIndex]]
        parameterName <- names(extraParams[["train"]])[paramIndex]
        if(length(parameter) == 1)
        {
          if(is.null(classifierParams$trainParams@otherParams)) classifierParams$trainParams@otherParams <- extraParams[["train"]][paramIndex]
          else classifierParams$trainParams@otherParams[parameterName] <- parameter
          if(parameterName %in% names(classifierParams$trainParams@tuneParams)) classifierParams$trainParams@tuneParams[[parameterName]] <- NULL
        } else if(length(parameter) > 1) {
          if(is.null(classifierParams$trainParams@tuneParams)) classifierParams$trainParams@tuneParams <- extraParams[["train"]][paramIndex]
          else classifierParams$trainParams@tuneParams[parameterName] <- parameter # Multiple values, so tune them.
        } else { # Remove the parameter
          inOther <- match(parameterName, names(classifierParams$trainParams@otherParams))
          inTune <- match(parameterName, names(classifierParams$trainParams@tuneParams))
          if(!is.na(inOther)) classifierParams$trainParams@otherParams <- classifierParams$trainParams@otherParams[-inOther]
          if(!is.na(inTune)) classifierParams$trainParams@tuneParams <- classifierParams$trainParams@tuneParams[-inTune]
        } 
      }
    }
    if(!is.null(extraParams) && "predict" %in% names(extraParams))
    {
      for(paramIndex in seq_along(extraParams[["predict"]]))
      {
        parameter <- extraParams[["predict"]][[paramIndex]]
        parameterName <- names(extraParams[["predict"]])[paramIndex]
        if(length(parameter) == 1)
        {
          if(is.null(classifierParams$predictParams@otherParams)) classifierParams$predictParams@otherParams <- extraParams[["predict"]][paramIndex]
          else classifierParams$predictParams@otherParams[parameterName] <- parameter
          if(parameterName %in% names(classifierParams$predictParams@tuneParams)) classifierParams$predictParams@tuneParams[[parameterName]] <- NULL
        } else if(length(parameter) > 1) {
          if(is.null(classifierParams$predictParams@tuneParams)) classifierParams$predictParams@tuneParams <- extraParams[["predict"]][paramIndex]
          else classifierParams$predictParams@tuneParams[parameterName] <- parameter # Multiple values, so tune them.
        } else { # Remove the parameter
          inOther <- match(parameterName, names(classifierParams$predictParams@otherParams))
          inTune <- match(parameterName, names(classifierParams$predictParams@tuneParams))
          if(!is.na(inOther)) classifierParams$predictParams@otherParams <- classifierParams$predictParams@otherParams[-inOther]
          if(!is.na(inTune)) classifierParams$predictParams@tuneParams <- classifierParams$predictParams@tuneParams[-inTune]
        } 
      }
    }    
    
    selectionMethod <- unlist(selectionMethod)

    if(selectionMethod != "none")
    {
      selectParams <- SelectParams(selectionMethod, tuneParams = list(nFeatures = nFeatures, performanceType = performanceType))
      if(!is.null(extraParams) && "select" %in% names(extraParams))
      {
        for(paramIndex in seq_along(extraParams[["select"]]))
        {
          parameter <- extraParams[["select"]][[paramIndex]]
          parameterName <- names(extraParams[["select"]])[paramIndex]
          if(length(parameter) == 1)
          {
            if(is.null(classifierParams$selectParams@otherParams)) classifierParams$selectParams@otherParams <- extraParams[["select"]][paramIndex]
            else classifierParams$selectParams@otherParams[parameterName] <- parameter
          } else if(length(parameter) > 1) {
            if(is.null(classifierParams$selectParams@tuneParams)) classifierParams$selectParams@tuneParams <- extraParams[["select"]][paramIndex]
            else classifierParams$selectParams@tuneParams[parameterName] <- parameter # Multiple values, so tune them.
          } else { # Remove the parameter
             inOther <- match(parameterName, names(classifierParams$selectParams@otherParams))
             inTune <- match(parameterName, names(classifierParams$selectParams@tuneParams))
             if(!is.na(inOther)) classifierParams$selectParams@otherParams <- classifierParams$selectParams@otherParams[-inOther]
             if(!is.na(inTune)) classifierParams$selectParams@tuneParams <- classifierParams$selectParams@tuneParams[-inTune]
          }
        }
      }
    } else {selectParams <- NULL}

    params <- ModellingParams(
        balancing = "none",
        selectParams = selectParams,
        trainParams = classifierParams$trainParams,
        predictParams = classifierParams$predictParams
    )

    #if(multiViewMethod != "none") stop("I haven't implemented multiview yet.")

    #
    # if(multiViewMethod == "prevalidation"){
    #     params$trainParams <- function(measurements, outcome) prevalTrainInterface(measurements, outcome, params)
    #     params$trainParams <- function(measurements, outcome) prevalTrainInterface(measurements, outcome, params)
    # }
    #

    params

}
######################################



generateMultiviewParams <- function(assayIDs,
                                    measurements,
                                    nFeatures,
                                    selectionMethod,
                                    selectionOptimisation,
                                    performanceType,
                                    classifier,
                                    multiViewMethod, extraParams){

    if(multiViewMethod == "merge"){

        if(length(classifier) > 1) classifier <- classifier[[1]]

        # Split measurements up by assay.
        assayTrain <- sapply(assayIDs, function(assayID) if(assayID == 1) measurements else measurements[, S4Vectors::mcols(measurements)[["assay"]] %in% assayID, drop = FALSE], simplify = FALSE)

        # Generate params for each assay. This could be extended to have different selectionMethods for each type
        paramsAssays <- mapply(generateModellingParams,
                                 nFeatures = nFeatures[assayIDs],
                                 selectionMethod = selectionMethod[assayIDs],
                                 assayIDs = assayIDs,
                                 measurements = assayTrain[assayIDs],
                                 MoreArgs = list(
                                     selectionOptimisation = selectionOptimisation,
                                     performanceType = performanceType,
                                     classifier = classifier,
                                     multiViewMethod = "none",
                                     extraParams = extraParams),
                                 SIMPLIFY = FALSE)

        # Generate some params for merged model. Which ones?
        # Reconsider how to do this well later. 
        params <- generateModellingParams(assayIDs = assayIDs,
                                          measurements = measurements,
                                          nFeatures = nFeatures,
                                          selectionMethod = selectionMethod[[1]],
                                          selectionOptimisation = "none",
                                          performanceType = performanceType,
                                          classifier = classifier[[1]],
                                          multiViewMethod = "none",
                                          extraParams = extraParams)

        # Update selectParams to use
        params@selectParams <- SelectParams("selectMulti",
                                            params = paramsAssays,
                                            characteristics = S4Vectors::DataFrame(characteristic = "Selection Name", value = "merge"),
                                            tuneParams = list(nFeatures = nFeatures[[1]],
                                                              performanceType = performanceType,
                                                              tuneMode = "none")
        )
        return(params)
    }

    if(multiViewMethod == "prevalidation"){

        # Split measurements up by assay.
        assayTrain <- sapply(assayIDs, function(assayID) measurements[, S4Vectors::mcols(measurements)[["assay"]] %in% assayID, drop = FALSE], simplify = FALSE)

        # Generate params for each assay. This could be extended to have different selectionMethods for each type
        paramsAssays <- mapply(generateModellingParams,
                                 nFeatures = nFeatures[assayIDs],
                                 selectionMethod = selectionMethod[assayIDs],
                                 assayIDs = assayIDs,
                                 measurements = assayTrain[assayIDs],
                                 classifier = classifier[assayIDs],
                                 MoreArgs = list(
                                     selectionOptimisation = selectionOptimisation,
                                     performanceType = performanceType,
                                     multiViewMethod = "none",
                                     extraParams = extraParams),
                                 SIMPLIFY = FALSE)


        params <- ModellingParams(
            balancing = "none",
            selectParams = NULL,
            trainParams = TrainParams(prevalTrainInterface, params = paramsAssays, characteristics = paramsAssays$clinical@trainParams@characteristics,
                          getFeatures = prevalFeatures),
            predictParams = PredictParams(prevalPredictInterface, characteristics = paramsAssays$clinical@predictParams@characteristics)
        )

        return(params)
    }

    if(multiViewMethod == "PCA"){

        # Split measurements up by assay.
        assayTrain <- sapply(assayIDs, function(assayID) measurements[, S4Vectors::mcols(measurements)[["assay"]] %in% assayID, drop = FALSE], simplify = FALSE)

        # Generate params for each assay. This could be extended to have different selectionMethods for each type
        paramsClinical <-  list(clinical = generateModellingParams(
                                 nFeatures = nFeatures["clinical"],
                                 selectionMethod = selectionMethod["clinical"],
                                 assayIDs = "clinical",
                                 measurements = assayTrain[["clinical"]],
                                 classifier = classifier["clinical"],
                                 selectionOptimisation = selectionOptimisation,
                                 performanceType = performanceType,
                                 multiViewMethod = "none",
                                 extraParams = extraParams))


        params <- ModellingParams(
            balancing = "none",
            selectParams = NULL,
            trainParams = TrainParams(pcaTrainInterface, params = paramsClinical, nFeatures = nFeatures, characteristics = paramsClinical$clinical@trainParams@characteristics,
                                      getFeatures = PCAfeatures),
            predictParams = PredictParams(pcaPredictInterface, characteristics = paramsClinical$clinical@predictParams@characteristics)
        )

        return(params)
    }

}

# measurements, outcome are mutually exclusive with x, outcomeTrain, measurementsTest, outcomeTest.
CV <- function(measurements, outcome, x, outcomeTrain, measurementsTest, outcomeTest,
               assayIDs,
               nFeatures,
               selectionMethod,
               selectionOptimisation,
               performanceType,
               classifier,
               multiViewMethod,
               nFolds,
               nRepeats,
               nCores,
               characteristicsLabel, extraParams, verbose)

{
    # Which data-types or data-views are present?
    if(is.null(characteristicsLabel)) characteristicsLabel <- "none"

    # Setup cross-validation parameters. Could be needed for independent train/test if parameter tuning
    # is specified to be done by nested cross-validation.
    crossValParams <- generateCrossValParams(nRepeats = nRepeats,
                                             nFolds = nFolds,
                                             nCores = nCores,
                                             selectionOptimisation = selectionOptimisation)
    

    # Turn text into TrainParams and TestParams objects
    modellingParams <- generateModellingParams(assayIDs = assayIDs,
                                               measurements = if(!is.null(measurements)) measurements else x,
                                               nFeatures = nFeatures,
                                               selectionMethod = selectionMethod,
                                               selectionOptimisation = selectionOptimisation,
                                               performanceType = performanceType,
                                               classifier = classifier,
                                               multiViewMethod = multiViewMethod, extraParams = extraParams)
    
    if(length(assayIDs) > 1 || length(assayIDs) == 1 && assayIDs != 1) assayText <- assayIDs else assayText <- NULL
    characteristics <- S4Vectors::DataFrame(characteristic = c(if(!is.null(assayText)) "Assay Name" else NULL, "Classifier Name", "Selection Name", "multiViewMethod", "characteristicsLabel"), value = c(if(!is.null(assayText)) paste(assayText, collapse = ", ") else NULL, paste(classifier, collapse = ", "),  paste(selectionMethod, collapse = ", "), multiViewMethod, characteristicsLabel))

    if(!is.null(measurements))
    { # Cross-validation.
      classifyResults <- runTests(measurements, outcome, crossValParams = crossValParams, modellingParams = modellingParams, characteristics = characteristics, verbose = verbose)
    } else { # Independent training and testing.
      classifyResults <- runTest(x, outcomeTrain, measurementsTest, outcomeTest, crossValParams = crossValParams, modellingParams = modellingParams, characteristics = characteristics)
      if(is.character(classifyResults)) stop(classifyResults)
      fullResult <- runTest(measurements, outcome, measurements, outcome, crossValParams = crossValParams, modellingParams = modellingParams, characteristics = characteristics, .iteration = 1)
      classifyResults@finalModel <- fullResult$models
    }
    
    classifyResults
}

simplifyResults <- function(results, values = c("assay", "classifier", "selectionMethod", "multiViewMethod")){
    ch <- sapply(results, function(x) x@characteristics[x@characteristics$characteristic %in% values, "value"], simplify = TRUE)
    ch <- data.frame(t(ch))
    results[!duplicated(ch)]
}

#' @rdname crossValidate
#' @importFrom generics train
#' @method train matrix
#' @export
train.matrix <- function(x, outcomeTrain, ...)
               {
                 x <- DataFrame(x, check.names = FALSE)
                 train(x, outcomeTrain, ...)
               }

#' @rdname crossValidate
#' @method train data.frame
#' @export
train.data.frame <- function(x, outcomeTrain, ...)
                    {
                      x <- DataFrame(x, check.names = FALSE)
                      train(x, outcomeTrain, ...)
                    }

#' @rdname crossValidate
#' @param assayIDs A character vector for assays to train with. Special value \code{"all"}
#' uses all assays in the input object.
#' @param performanceType Performance metric to optimise if classifier has any tuning parameters.
#' @method train DataFrame
#' @export
train.DataFrame <- function(x, outcomeTrain, selectionMethod = "auto", nFeatures = 20, classifier = "auto", performanceType = "auto",
                            multiViewMethod = "none", assayIDs = "all", extraParams = NULL, verbose = 0, ...)
                   {
              prepParams <- list(x, outcomeTrain)
              if(!is.null(extraParams) && "prepare" %in% names(extraParams))
                prepParams <- c(prepParams, extraParams[["prepare"]])
              measurementsAndOutcome <- do.call(prepareData, prepParams)
              measurements <- measurementsAndOutcome[["measurements"]]
              outcomeTrain <- measurementsAndOutcome[["outcome"]]
              
              # Ensure performance type is one of the ones that can be calculated by the package.
              if(!performanceType %in% c("auto", .ClassifyRenvir[["performanceTypes"]]))
                stop(paste("performanceType must be one of", paste(c("auto", .ClassifyRenvir[["performanceTypes"]]), collapse = ", "), "but is", performanceType))

              isCategorical <- is.character(outcomeTrain) && (length(outcomeTrain) == 1 || length(outcomeTrain) == nrow(measurements)) || is.factor(outcomeTrain)
              if(performanceType == "auto")
                if(isCategorical) performanceType <- "Balanced Accuracy" else performanceType <- "C-index"
              if(length(selectionMethod) == 1 && selectionMethod == "auto")
                if(isCategorical) selectionMethod <- "t-test" else selectionMethod <- "CoxPH"
              if(length(classifier) == 1 && classifier == "auto")
                if(isCategorical) classifier <- "randomForest" else classifier <- "CoxPH"

              classifier <- cleanClassifier(classifier = classifier, measurements = measurements)
              selectionMethod <- cleanSelectionMethod(selectionMethod = selectionMethod, measurements = measurements)
              if(assayIDs == "all") assayIDs <- unique(S4Vectors::mcols(measurements)$assay)
              if(is.null(assayIDs)) assayIDs <- 1
              names(assayIDs) <- assayIDs
              names(classifier) <- assayIDs

              if(multiViewMethod == "none"){
                  resClassifier <-
                      sapply(assayIDs, function(assayIndex) {
                          # Loop over assays
                          sapply(classifier[[assayIndex]], function(classifierForAssay) {
                              # Loop over classifiers
                                sapply(selectionMethod[[assayIndex]], function(selectionForAssay) {
                                  # Loop over selectors
                              
                                  measurementsUse <- measurements
                                  if(assayIndex != 1) measurementsUse <- measurements[, S4Vectors::mcols(measurements)[, "assay"] == assayIndex, drop = FALSE]
                                  
                                  modellingParams <- generateModellingParams(assayIDs = assayIDs, measurements = measurements, nFeatures = nFeatures,
                                                     selectionMethod = selectionMethod, selectionOptimisation = "Resubstitution", performanceType = performanceType,
                                                     classifier = classifier, multiViewMethod = "none", extraParams = extraParams)

                                  if(!is.null(modellingParams@selectParams))
                                  {
                                    topFeatures <- .doSelection(measurementsUse, outcomeTrain, CrossValParams(), modellingParams, verbose = verbose)
                                    selectedFeaturesIndices <- topFeatures[[2]] # Extract for subsetting.
                                    tuneDetailsSelect <- topFeatures[[3]]
                                    measurementsUse <- measurementsUse[, selectedFeaturesIndices]
                                  } else {
                                    tuneDetailsSelect <- NULL
                                    measurementsUse <- measurements
                                  }

                                  classifierParams <- .classifierKeywordToParams(classifierForAssay)
                                  if(!is.null(extraParams) && "train" %in% names(extraParams))
                                  {
                                     for(paramIndex in seq_along(extraParams[["train"]]))
                                     {
                                        parameter <- extraParams[["train"]][[paramIndex]]
                                        parameterName <- names(extraParams[["train"]])[paramIndex]
                                        if(length(parameter) == 1)
                                        {
                                          if(is.null(classifierParams$trainParams@otherParams)) classifierParams$trainParams@otherParams <- extraParams[["train"]][paramIndex]
                                          else classifierParams$trainParams@otherParams[parameterName] <- parameter
                                        } else if (length(parameter) > 1) {
                                          if(is.null(classifierParams$trainParams@tuneParams)) classifierParams$trainParams@tuneParams <- extraParams[["train"]][paramIndex]
                                          else classifierParams$trainParams@tuneParams[parameterName] <- parameter # Multiple values, so tune them.
                                        } else { # Remove the parameter
                                          inOther <- match(parameterName, names(classifierParams$trainParams@otherParams))
                                          inTune <- match(parameterName, names(classifierParams$trainParams@tuneParams))
                                          if(!is.na(inOther)) classifierParams$trainParams@otherParams <- classifierParams$trainParams@otherParams[-inOther]
                                          if(!is.na(inTune)) classifierParams$trainParams@tuneParams <- classifierParams$trainParams@otherParams[-inTune]
                                        }
                                      }
                                    }
                                  if(!is.null(extraParams) && "predict" %in% names(extraParams))
                                  {
                                      for(paramIndex in seq_along(extraParams[["predict"]]))
                                      {
                                        parameter <- extraParams[["predict"]][[paramIndex]]
                                        parameterName <- names(extraParams[["predict"]])[paramIndex]
                                        if(length(parameter) == 1)
                                        {
                                          if(is.null(classifierParams$predictParams@otherParams)) classifierParams$predictParams@otherParams <- extraParams[["predict"]][paramIndex]
                                          else classifierParams$predictParams@otherParams[parameterName] <- parameter
                                        } else if (length(parameter) > 1) {
                                          if(is.null(classifierParams$predictParams@tuneParams)) classifierParams$predictParams@tuneParams <- extraParams[["predict"]][paramIndex]
                                          else classifierParams$predictParams@tuneParams[parameterName] <- parameter # Multiple values, so tune them.
                                        } else { # Remove the parameter
                                          inOther <- match(parameterName, names(classifierParams$predictParams@otherParams))
                                          inTune <- match(parameterName, names(classifierParams$predictParams@tuneParams))
                                          if(!is.na(inOther)) classifierParams$predictParams@otherParams <- classifierParams$predictParams@otherParams[-inOther]
                                          if(!is.na(inTune)) classifierParams$predictParams@tuneParams <- classifierParams$predictParams@otherParams[-inTune]
                                        } 
                                      }
                                    }
                                  
                                  modellingParams <- ModellingParams(balancing = "none", selectParams = NULL,
                                                                     trainParams = classifierParams$trainParams, predictParams = classifierParams$predictParams)
                                  if(!is.null(tuneDetailsSelect))
                                  {
                                    tuneDetailsSelectUse <- tuneDetailsSelect[["tuneCombinations"]][tuneDetailsSelect[["bestIndex"]], , drop = FALSE]
                                    avoidTune <- match(colnames(tuneDetailsSelectUse), names(modellingParams@trainParams@tuneParams))
                                    if(any(!is.na(avoidTune)))
                                    {
                                      modellingParams@trainParams@otherParams <- c(modellingParams@trainParams@otherParams, tuneDetailsSelectUse[!is.na(avoidTune)])
                                      modellingParams@trainParams@tuneParams <- modellingParams@trainParams@tuneParams[-na.omit(avoidTune)]
                                      if(length(modellingParams@trainParams@tuneParams) == 0) modellingParams@trainParams@tuneParams <- NULL
                                    }
                                  }
                                  if(!is.null(modellingParams@trainParams@tuneParams))
                                    modellingParams$trainParams@tuneParams <- c(modellingParams$trainParams@tuneParams, performanceType = performanceType)
                                  
                                  trained <- .doTrain(measurementsUse, outcomeTrain, NULL, NULL, CrossValParams(), modellingParams, verbose = verbose)[["model"]]
                                  attr(trained, "predictFunction") <- classifierParams$predictParams@predictor
                                  trained
                                  ## train model
                                }, simplify = FALSE)
                          }, simplify = FALSE)
                      }, simplify = FALSE)

                  models <- unlist(unlist(resClassifier, recursive = FALSE), recursive = FALSE)
                  if(length(models) == 1) {
                      model <- models[[1]]
                      class(model) <- c("trainedByClassifyR", class(model))
                      models <- NULL
                  } else {
                      class(models) <- c("listOfModels", "trainedByClassifyR", class(models))
                  }
              }

              ################################
              #### Yes multiview
              ################################

              ### Merging or binding to combine data
              if(multiViewMethod == "merge"){
                  measurementsUse <- measurements[, S4Vectors::mcols(measurements)[["assay"]] %in% assayIDs, drop = FALSE]
                  model <- .doTrain(measurementsUse, outcomeTrain, NULL, NULL, crossValParams, modellingParams, verbose = verbose)[["model"]]
                  class(model) <- c("trainedByClassifyR", class(model))
              }


              ### Prevalidation to combine data
              if(multiViewMethod == "prevalidation"){
                # Split measurements up by assay.
                 assayTrain <- sapply(assayIDs, function(assayID) measurements[, S4Vectors::mcols(measurements)[["assay"]] %in% assayID, drop = FALSE], simplify = FALSE)

               # Generate params for each assay. This could be extended to have different selectionMethods for each type
                 paramsAssays <- mapply(generateModellingParams,
                                        nFeatures = nFeatures[assayIDs],
                                        selectionMethod = selectionMethod[assayIDs],
                                        assayIDs = assayIDs,
                                        measurements = assayTrain[assayIDs],
                                        classifier = classifier[assayIDs],
                                        MoreArgs = list(multiViewMethod = "none"),
                                 SIMPLIFY = FALSE)

                 modellingParams <- ModellingParams(
                                    balancing = "none",
                                    selectParams = NULL,
                                    trainParams = TrainParams(prevalTrainInterface, params = paramsAssays, characteristics = paramsAssays$clinical@trainParams@characteristics,
                                                              getFeatures = prevalFeatures),
                                    predictParams = PredictParams(prevalPredictInterface, characteristics = paramsAssays$clinical@predictParams@characteristics))
                 model <- .doTrain(measurementsUse, outcomeTrain, NULL, NULL, crossValParams, modellingParams, verbose = verbose)[["model"]]
                 class(model) <- c("trainedByClassifyR", class(model))
              }
              
              ### Principal Components Analysis to combine data
              if(multiViewMethod == "PCA"){
                measurementsUse <- measurements[, S4Vectors::mcols(measurements)[["assay"]] %in% assayIDs, drop = FALSE]
                paramsClinical <-  list(clinical = generateModellingParams(
                                        assayIDs = "clinical",
                                        measurements = measurements[, S4Vectors::mcols(measurements)[["assay"]] == "clinical", drop = FALSE],
                                        classifier = classifier["clinical"],
                                        multiViewMethod = "none"))
                
                modellingParams <- ModellingParams(balancing = "none", selectParams = NULL,
                                   trainParams = TrainParams(pcaTrainInterface, params = paramsClinical, nFeatures = nFeatures, characteristics = paramsClinical$clinical@trainParams@characteristics,
                                                             getFeatures = PCAfeatures),
                                   predictParams = PredictParams(pcaPredictInterface, characteristics = paramsClinical$clinical@predictParams@characteristics))
                model <- .doTrain(measurementsUse, outcomeTrain, NULL, NULL, crossValParams, modellingParams, verbose = verbose)[["model"]]
                class(model) <- c("trainedByClassifyR", class(model))
              }
              if(missing(models) || is.null(models)) return(model) else return(models)
          }

#' @rdname crossValidate
#' @method train list
#' @export
train.list <- function(x, outcomeTrain, ...)
              {
                # Check data type is valid
                if (!(all(sapply(x, function(element) is(element, "tabular")))))
                  stop("assays must be of type data.frame, DataFrame or matrix")
              
                # Check the list is named
                if (is.null(names(x)))
                  stop("Measurements must be a named list")
              
                # Check same number of samples for all datasets
                if (!length(unique(sapply(x, nrow))) == 1)
                  stop("All datasets must have the same samples")
              
                # Check the number of outcome is the same
                if (!all(sapply(x, nrow) == length(outcomeTrain)) && !is.character(outcomeTrain))
                  stop("outcome must have same number of samples as measurements")
              
              df_list <- sapply(x, S4Vectors::DataFrame)
              
              df_list <- mapply(function(meas, nam){
                  S4Vectors::mcols(meas)$assay <- nam
                  S4Vectors::mcols(meas)$feature <- colnames(meas)
                  meas
              }, df_list, names(df_list))
              
              combined_df <- do.call(cbind, df_list)
              
              # Each list of tabular data has been collapsed into a DataFrame.
              # Will be subset to relevant assayIDs inside the DataFrame method.
              
              train(combined_df, outcomeTrain, ...)
}

#' @rdname crossValidate
#' @method train MultiAssayExperiment
#' @export
train.MultiAssayExperiment <- function(x, outcome, ...)
          {
              prepArgs <- list(x, outcome)
              extraInputs <- list(...)
              prepExtras <- trainExtras <- numeric()
              if(length(extraInputs) > 0)
                prepExtras <- which(names(extraInputs) %in% .ClassifyRenvir[["prepareDataFormals"]])
              if(length(prepExtras) > 0)
                prepArgs <- append(prepArgs, extraInputs[prepExtras])
              measurementsAndOutcome <- do.call(prepareData, prepArgs)
              trainArgs <- list(measurementsAndOutcome[["measurements"]], measurementsAndOutcome[["outcome"]])
              if(length(extraInputs) > 0)
                trainExtras <- which(!names(extraInputs) %in% .ClassifyRenvir[["prepareDataFormals"]])
              if(length(trainExtras) > 0)
                trainArgs <- append(trainArgs, extraInputs[trainExtras])
              do.call(train, trainArgs)
          }

#' @rdname crossValidate
#' @param object A fitted model or a list of such models.
#' @param newData For the \code{predict} function, an object of type \code{matrix}, \code{data.frame}
#' \code{DataFrame}, \code{list} (of matrices or data frames) or \code{MultiAssayExperiment} containing
#' the data to make predictions with with either a fitted model created by \code{train} or the final model
#' stored in a \code{\link{ClassifyResult}} object.
#' @method predict trainedByClassifyR
#' @export
predict.trainedByClassifyR <- function(object, newData, outcome, ...)
{
  if(is(newData, "tabular")) # Simply tabular data.
  {
    colnames(newData) <- make.names(colnames(newData)) # Ensure that feature names are syntactically valid, like during model fitting.
  } else if(is.list(newData) && !is(object, "listOfModels")) # Don't check all those conditions that train function does.
  { # Merge the list of data tables and keep track of assay names in columns' metadata.
    newData <- mapply(function(meas, nam){
               S4Vectors::mcols(meas)$assay <- nam
               S4Vectors::mcols(meas)$feature <- colnames(meas)
               meas
               }, newData, names(newData))
    newData <- do.call(cbind, newData)
    } else if(is(newData, "MultiAssayExperiment"))
            {
              newData <- prepareData(newData, outcome)
    }
    
    predictFunctionUse <- attr(object, "predictFunction")
    class(object) <- rev(class(object)) # Now want the predict method of the specific model to be picked, so put model class first.
    if (is(object, "listOfModels")) 
         mapply(function(model, assay) predictFunctionUse(model, assay), object, newData, MoreArgs = list(...), SIMPLIFY = FALSE)
    else do.call(predictFunctionUse, list(object, newData, ...)) # Object is itself a trained model and it is assumed that a predict method is defined for it.
}
DarioS/ClassifyR documentation built on June 11, 2024, 11:25 a.m.