prepareData: Convert Different Data Classes into DataFrame and Filter...
In DarioS/ClassifyR: A framework for cross-validated classification problems, with applications to differential variability and differential distribution testing

prepareData

R Documentation

Convert Different Data Classes into DataFrame and Filter Features

Description

Input data could be of matrix, MultiAssayExperiment, or DataFrame format and this function will prepare a DataFrame of features and a vector of outcomes and help to exclude nuisance features such as dates or unique sample identifiers from subsequent modelling.

Usage

## S4 method for signature 'matrix'
prepareData(measurements, outcome, ...)

## S4 method for signature 'data.frame'
prepareData(measurements, outcome, ...)

## S4 method for signature 'DataFrame'
prepareData(
  measurements,
  outcome,
  useFeatures = NULL,
  maxMissingProp = 0,
  maxSimilarity = 1,
  topNvariance = NULL
)

## S4 method for signature 'MultiAssayExperiment'
prepareData(measurements, outcomeColumns = NULL, useFeatures = NULL, ...)

## S4 method for signature 'list'
prepareData(measurements, outcome = NULL, useFeatures = NULL, ...)

Arguments

`measurements`	Either a `matrix`, `DataFrame` or `MultiAssayExperiment` containing all of the data. For a `matrix` or `DataFrame`, the rows are samples, and the columns are features.
`...`	Variables not used by the `matrix` nor the `MultiAssayExperiment` method which are passed into and used by the `DataFrame` method.
`outcome`	Either a factor vector of classes, a `Surv` object, or a character string, or vector of such strings, containing column name(s) of column(s) containing either classes or time and event information about survival. If column names of survival information, time must be in first column and event status in the second.
`useFeatures`	Default: `NULL` (i.e. use all provided features). If `measurements` is a `MultiAssayExperiment` or list of tabular data, a named list of features to use. Otherwise, the input data is a single table and this can just be a vector of feature names. For any assays not in the named list, all of their features are used. `"clinical"` is also a valid assay name and refers to the clinical data table. This allows for the avoidance of variables such spike-in RNAs, sample IDs, sample acquisition dates, etc. which are not relevant for outcome prediction.
`maxMissingProp`	Default: 0.0. A proportion less than 1 which is the maximum tolerated proportion of missingness for a feature to be retained for modelling.
`maxSimilarity`	Default: 1. A number between 0 and 1 which is the maximum similarity between a pair of variables to be both kept in the data set. For numerical variables, the Pearson correlation is used and for categorical variables, the Chi-squared test p-value is used. For a pair that is too similar, the second variable will be excluded from the data set.
`topNvariance`	Default: NULL. If `measurements` is a `MultiAssayExperiment` or list of tabular data, a named integer vector of most variable features per assay to subset to. If the input data is a single table, then simply a single integer. If an assays has less features, it won't be reduced in size but stay as-is.
`outcomeColumns`	If `measurements` is a `MultiAssayExperiment`, the names of the column (class) or columns (survival) in the table extracted by `colData(data)` that contain(s) the each individual's outcome to use for prediction.

Value

A list of length two. The first element is a DataFrame of features and the second element is the outcomes to use for modelling.

Author(s)

Dario Strbenac

DarioS/ClassifyR documentation built on April 14, 2025, 8:36 a.m.

DarioS/ClassifyR index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

DarioS/ClassifyR
A framework for cross-validated classification problems, with applications to differential variability and differential distribution testing

prepareData: Convert Different Data Classes into DataFrame and Filter...
In DarioS/ClassifyR: A framework for cross-validated classification problems, with applications to differential variability and differential distribution testing

Convert Different Data Classes into DataFrame and Filter Features

Description

Usage

Arguments

Value

Author(s)

Related to prepareData in DarioS/ClassifyR...

R Package Documentation

Browse R Packages

We want your feedback!

DarioS/ClassifyR A framework for cross-validated classification problems, with applications to differential variability and differential distribution testing

prepareData: Convert Different Data Classes into DataFrame and Filter... In DarioS/ClassifyR: A framework for cross-validated classification problems, with applications to differential variability and differential distribution testing

Convert Different Data Classes into DataFrame and Filter Features

Description

Usage

Arguments

Value

Author(s)

Related to prepareData in DarioS/ClassifyR...

R Package Documentation

Browse R Packages

We want your feedback!

DarioS/ClassifyR
A framework for cross-validated classification problems, with applications to differential variability and differential distribution testing

prepareData: Convert Different Data Classes into DataFrame and Filter...
In DarioS/ClassifyR: A framework for cross-validated classification problems, with applications to differential variability and differential distribution testing