View source: R/verifyDoppelgangers.R
verifyDoppelgangers | R Documentation |
The user constructs a csv file with with training-validation set pairs ideally incrementing the number of Doppelgangers between training and validation sets. For each training-validation set pair, 12 models with different feature sets will be trained. 10 random feature sets and 2 features sets of highest and lowest variance would be generated. If an increase in validation accuracy of the 10 random models with increasing number of doppelgangers can be observed, we can conclude that the doppelgangers included are functional doppelgangers.
verifyDoppelgangers( experiment_plan_filename, raw_data, meta_data, feature_set_portion = 0.1, seed_num = 2021, separator = "\\.", do_batch_corr = TRUE, k = 5, num_random_feature_sets = 10, size_of_val_set = 8, batch_corr_method = "ComBat", neg_con_seed = 10 )
experiment_plan_filename |
Name of file containing csv experiment plan. The csv file has a header with the names of the training_validation sets (e.g. "Doppel_0.train" or "Doppel_0.valid"). In each column (e.g. "Doppel_0.train" column), we include the names of all samples included in this training/validation set. |
raw_data |
Dataframe of count matrix before batch correction |
meta_data |
Dataframe of meta data |
feature_set_portion |
Proportion of variables to be used for feature set generation |
seed_num |
Seed number for random feature set generation |
separator |
The character separating the name of the training_validation pair e.g. "0 Doppel" from the "train", "valid" label. Name of each column should be in format "0 Doppel.train" if . is used as separator |
do_batch_corr |
If False, no batch correction is carried out |
k |
k hyperparameter for KNN classification models |
num_random_feature_sets |
Number of random feature sets for each training-validation set |
size_of_val_set |
Size of each validation set (We assume the size of each validation set is the same, this is used for the binomial model) |
batch_corr_method |
Batch correlation method used. Only 2 options are accepted "ComBat" or "ComBat_seq". |
neg_con_seed |
Seed used for negative control |
Troubleshooting tips:
Ensure all the headers have no spaces.
If excel is used for planning, save the spreadsheet as "CSV (MS-DOS) (*.csv)"
Use the exact label "train" and "valid" (take note of capital letters)
Ensure the separator does not exist in the name of the training-validation set (E.g. Doppel.0 is not allowed)
Try to put both training-validation columns beside each other and leave no column gaps
Refer to the csv file in the tutorial on the GitHub README.
Validation Accuracies
## Not run: verificationResults = verifyDoppelgangers( experiment_plan_filename = "tutorial/experimentPlan.csv", raw_data = rc, meta_data = rc_metadata) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.