aklimate: aklimate
In VladoUzunangelov/aklimate: AKLIMATE : Algorithm for Kernel Learning with Approximating Tree Ensembles

Description Usage Arguments Value References

View source: R/aklimate.R

AKLIMATE : Algorithm for Kernel Learning with Approximating Tree Ensembles

1
2
3

aklimate(dat, dat_grp, lbls, fsets, always_add = NULL,
  rf_pars = list(), akl_pars = list(), store_kernels = FALSE,
  verbose = FALSE)

`dat`	samples x features data frame where columns might be of different type
`dat_grp`	a list of vectors, each consisting of suffixes for data types that match the ones used in dat. Each vector corresponds to a particular combination of data types that will be tested for each component RF. Only the data type combination with the best performance for a given feature set is retained. The data type suffixes should be distinct from one another so that none is a proper substring of another - i.e. c('cnv','cnv_gistic') is not OK, but c('MUTA:HOT','MUTA:NONSENSE') is. This argument is considered experimental - we recommend supplying a list of length 1, with the list entry a vector of all possible suffixes.
`lbls`	vector of training data labels
`fsets`	list of prior knowledge feature sets
`always_add`	vector of dat column names that are to be included with each fset
`rf_pars`	list of parameters for RF base kernels run ntree Number of trees for RF kernel construction. Default is 1000. min_node_prop Minimal size of leaf nodes (unit is proportion of training set size). Default is 0.01. min_nfeat Minimal size of feature set (across all data modalities) for an RF to be constructed. Default is 15. mtry_prop Proportion of features to be considered for each splitting decision. Default is 0.25 regression_q For regression predictions only. Quantile of the per-sample empirical distribution of absolute differences between RF sample predictions and sample label. Used for binarization of sample predictions during best RF selection. Default 0.05. replace TRUE/FALSE. Is subsampling to be done with replacement? Default is FALSE. sample_frac Fraction of training data points to subsample for each tree. Default is 0.5 for sampling without replacement and 1 for bootstrapping. ttype Type of learning task - choices are "binary","multiclass", and "regression". Default is "binary". split_rule Type of splitting criteria- choices are "gini","hellinger","variance",and "beta". See ranger documentation for more details. Default is "gini". importance Rule for calculating feature and feature set importance - choices are "impurity_corrected","permutation",and "impurity". Default is "impurity_corrected". metric Metric for ranking RF base learner performance used in the selection of best RFs. Choices are "roc","pr","acc","bacc","mar","rmse","rsq","mae","pearson", and "spearman". Default is "roc". unordered_factors How to treat unordered factors. Choices are "order","ignore", and "partition". See ranger for more details. Default is "order". oob_cv A data frame of parameters to tune during trainings of all RF base learners, with OOB metric performance (from choices above) used to select best combination. Each row of the data frame includes a different combination of RF hyperparameters. The data frame has to contain at least two columns, with one column being "ntree". Having too many hyperparameter combinations can lead to significant slowdown in computation time. Default is a data frame of 1 row using the "min_node_prop","mtry_prop", and "ntree"/2 values of the rf_pars list. This argument is experimental - we recommend using the default setting.
`akl_pars`	list of parameters for RF best kernel selection and MKL meta-learner topn number of RF kernels (ranked by metric specified in rf_pars) that correctly predict a given sample to be included in best RF list. Default is 5. cvlen Number of random MKL hyperparameter combinations to be tested during MKL CV step. Default is 100. nfold Number of folds to be used in MKL CV. Default is 5. lamb Interval bounds from which random MKL hyperparameter combinations are drawn (log2 units). Default is (-20,0). subsetCV TRUE/FALSE. When TRUE, the MKL CV step also randomly varies the number of RF kernels in addition to the MKL regularization hyperparameters. It does so by training on a subset of kernels of size K, randomly selected on the (0,number best RF kernels) interval. Once K is selected, the top K kernels (ranked by metric specified in rf_pars) are included in current CV run. Default is TRUE. type Type of predictions - possible choices are "response" and "probability". Default is "response". celnet Hyperparameters for MKL elastic net run. Should be a vector of length 2. Default is NULL - hyperparameters are tuned via internal cross-validation.
`store_kernels`	TRUE/FALSE. Should the model store the training RF kernels. Default is FALSE.
`verbose`	TRUE/FALSE. Should the model print verbose progress statements. Default is FALSE.

a model of class AKLIMATE with the following fields:

rf_stats: List of metrics and predictions from training run on all RF base learners.
kernels: RF kernels used in MKL training step. NULL if store_kernels is set to FALSE.
kern_cv: if akl_pars$celnet is NULL, hyperparameter vectors examined during MKL cross-validation, along with matching metric scores.
rf_models: Set of RF base learners used to produce RF kernels for stacked MKL.
akl_model: Trained spicer MKL model, with either user-supplied elastic net hyperparameters, or the hyperparameters selected via CV tuning.
rf_pars_global: rf_pars argument
rf_pars_local: optimal RF parameters for each RF base learner. Those will be the same (with the exception of ntree) as the rf_pars_global parameters unless rf_pars$oob_cv was specified by the user.
akl_pars: akl_pars argument
dat_grp: dat_grp argument
idx_train: Vector of training data instances.
preds_train: AKLIMATE predictions on training set.

V. Uzunangelov, C. K. Wong, and J. Stuart. Highly Accurate Cancer Phenotype Prediction with AKLIMATE, a Stacked Kernel Learner Integrating Multimodal Genomic Data and Pathway Knowledge. bioRxiv, July 2020.

VladoUzunangelov/aklimate documentation built on Aug. 17, 2020, 4:40 a.m.

VladoUzunangelov/aklimate index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

VladoUzunangelov/aklimate
AKLIMATE : Algorithm for Kernel Learning with Approximating Tree Ensembles

aklimate: aklimate
In VladoUzunangelov/aklimate: AKLIMATE : Algorithm for Kernel Learning with Approximating Tree Ensembles

Description

Usage

Arguments

Value

References

Related to aklimate in VladoUzunangelov/aklimate...

R Package Documentation

Browse R Packages

We want your feedback!

VladoUzunangelov/aklimate AKLIMATE : Algorithm for Kernel Learning with Approximating Tree Ensembles

aklimate: aklimate In VladoUzunangelov/aklimate: AKLIMATE : Algorithm for Kernel Learning with Approximating Tree Ensembles

Description

Usage

Arguments

Value

References

Related to aklimate in VladoUzunangelov/aklimate...

R Package Documentation

Browse R Packages

We want your feedback!

VladoUzunangelov/aklimate
AKLIMATE : Algorithm for Kernel Learning with Approximating Tree Ensembles

aklimate: aklimate
In VladoUzunangelov/aklimate: AKLIMATE : Algorithm for Kernel Learning with Approximating Tree Ensembles