knitr::opts_chunk$set(echo = TRUE)
CBRMSR is Case Based Reasoning with Multi-Stage Retrieval. The package allows for splitting or folding, feature selection with a balanced iterative random forest or random KNN algorithm, and class balancing with ADASYN or SMOTE. Distance matrices can be created on confounding variable data, and numeric predictor variable data. During retrieval, cases are retrieved using a confidence metric that automaticallyretrieves cases for each test case until a confidence threshold is reached. First, cases are retrieved fromsimilar confounding attributes, followed by retrieving from similar predictor attributes.
Three dataframes are required.
suppressMessages(library(caret)) suppressMessages(library(smotefamily)) suppressMessages(library(imbalance)) suppressMessages(library(randomForest)) suppressMessages(library(stats)) suppressMessages(library(sidier)) suppressMessages(library(R6)) suppressMessages(library(sna)) suppressMessages(library(cluster)) suppressMessages(library(tictoc)) suppressMessages(library(nomclust)) suppressMessages(library(analogue)) suppressMessages(library(data.table)) suppressMessages(library(rknn)) suppressMessages(library(plyr))
library(CBRMSR)
Three dataframes are included for testing purposes. Predictor is a subsetted dataset of unique probes from differentially methylated regions in the TCGA-BRCA dataset. Confounding is the associated confounding variables that have been converted to numeric. Ages were assigned to a group based on their placement in an age range. Classframe is a dataframe where the first column is the TCGA sample names, and the second column is their classification label.
data(predictor) data(confounding) data(classframe)
dim(predictor)
dim(confounding)
dim(classframe)
predictor[1:5,1:5]
confounding[1:5,1:5]
head(classframe, n = 5)
CBRMSR <- create_CBRMSR(predictor = predictor, confounding = confounding, classframe = classframe)
SplitPercent is the percentage of data that will be used for the training set, in decimal form
CBRMSR <- splitting_module(CBRMSR, SplitPercent = 0.75) CBRMSR <- folding_module(CBRMSR, Folds = 10)
CBRMSR <- selection_module(CBRMSR, method = "BIRF")
Calculate the categorical and numeric distances. Categorical distances can be calculated using the Goodall or the Lin algorithm.
CBRMSR <- distance_module(CBRMSR, categorical.similarity = "Goodall", confounding.type = "categorical", feature.weights = TRUE)
This module runs a two-stage process. First, it retrieves samples from similar confounding factors before reducing the pool of retrieved samples through using similarity among the predictor variables. This is done automatically using a confidence metric. Each training sample is assigned a confidence value which is the average distance to samples of a different class minus the average distance to samples of the same class. This value is normalized between 0 and 1. This is the last module that should be run.
CBRMSR <- two_stage_module(CBRMSR)
Used with \$. Example, if "myCBRMSR" was the name of your CBRMSR object, and you wanted to access the confusion matrices for the testing data, the syntax is: myCBRMSR\$testing.confusion.matrices
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.