verifyDoppelgangers: Verifies the functionality of Doppelgangers

View source: R/verifyDoppelgangers.R

verifyDoppelgangersR Documentation

Verifies the functionality of Doppelgangers

Description

The user constructs a csv file with with training-validation set pairs ideally incrementing the number of Doppelgangers between training and validation sets. For each training-validation set pair, 12 models with different feature sets will be trained. 10 random feature sets and 2 features sets of highest and lowest variance would be generated. If an increase in validation accuracy of the 10 random models with increasing number of doppelgangers can be observed, we can conclude that the doppelgangers included are functional doppelgangers.

Usage

verifyDoppelgangers(
  experiment_plan_filename,
  raw_data,
  meta_data,
  feature_set_portion = 0.1,
  seed_num = 2021,
  separator = "\\.",
  do_batch_corr = TRUE,
  k = 5,
  num_random_feature_sets = 10,
  size_of_val_set = 8,
  batch_corr_method = "ComBat",
  neg_con_seed = 10
)

Arguments

experiment_plan_filename

Name of file containing csv experiment plan. The csv file has a header with the names of the training_validation sets (e.g. "Doppel_0.train" or "Doppel_0.valid"). In each column (e.g. "Doppel_0.train" column), we include the names of all samples included in this training/validation set.

raw_data

Dataframe of count matrix before batch correction

meta_data

Dataframe of meta data

feature_set_portion

Proportion of variables to be used for feature set generation

seed_num

Seed number for random feature set generation

separator

The character separating the name of the training_validation pair e.g. "0 Doppel" from the "train", "valid" label. Name of each column should be in format "0 Doppel.train" if . is used as separator

do_batch_corr

If False, no batch correction is carried out

k

k hyperparameter for KNN classification models

num_random_feature_sets

Number of random feature sets for each training-validation set

size_of_val_set

Size of each validation set (We assume the size of each validation set is the same, this is used for the binomial model)

batch_corr_method

Batch correlation method used. Only 2 options are accepted "ComBat" or "ComBat_seq".

neg_con_seed

Seed used for negative control

Details

Troubleshooting tips:

  • Ensure all the headers have no spaces.

  • If excel is used for planning, save the spreadsheet as "CSV (MS-DOS) (*.csv)"

  • Use the exact label "train" and "valid" (take note of capital letters)

  • Ensure the separator does not exist in the name of the training-validation set (E.g. Doppel.0 is not allowed)

  • Try to put both training-validation columns beside each other and leave no column gaps

  • Refer to the csv file in the tutorial on the GitHub README.

Value

Validation Accuracies

Examples

## Not run: 
verificationResults = verifyDoppelgangers(
experiment_plan_filename = "tutorial/experimentPlan.csv",
raw_data = rc,
meta_data = rc_metadata)

## End(Not run)

lr98769/doppelgangerIdentifier documentation built on Aug. 2, 2022, 9:41 a.m.