View source: R/opdisDownsampling.R
| opdisDownsampling | R Documentation | 
The package provides the necessary functions for optimal distribution-preserving down-sampling of large (bio-medical) data sets.
opdisDownsampling(Data, Cls, Size, Seed, nTrials = 1000,
TestStat = "ad", MaxCores = getOption("mc.cores", 2L), PCAimportance = FALSE)
| Data | the (numerical!) data as a vector, matrix or data frame. | 
| Cls | the class information, if any, as a vector of similar length as instances in the data. | 
| Size | the total number of instances across all classes to be drawn. | 
| Seed | a predefined seed to modify the results. | 
| nTrials | how many samples to choose from should be randomly drawn. | 
| TestStat | statistical criterion for similarity judgment. | 
| MaxCores | maximum number of cpu cores to use for parallel computing. | 
| PCAimportance | PCA based feature selection; only variables important in PCA projection are considered. | 
Returns a list of data containing the drawn samples and the omitted data.
| ReducedData | the selected sample data and class information. | 
| ReducedData | the not-selected sample data and class information. | 
| ReducedInstances | the instance numbers of the selected sample data. | 
Jorn Lotsch
Lotsch, J., Malkusch, S., Ultsch, A. (2021): Optimal distribution-preserving downsampling of large biomedical data sets (opdisDownsampling). PLoS One. 2021 Aug 5;16(8):e0255838. doi: 10.1371/journal.pone.0255838. eCollection 2021.
## example 1
data(iris)
Iris50percent <- opdisDownsampling(Data = iris[,1:4], Cls = as.integer(iris$Species),
  Size = 50, MaxCores = 1)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.