README.md
In lr98769/doppelgangerIdentifier: Identifies Doppelgangers Between Datasets With PPCC And Meta data

doppelgangerIdentifier

The goal of doppelgangerIdentifier is to find PPCC data doppelgangers that may have an inflationary effect on model accuracy.

PPCC: Pairwise Pearson’s Correlation Coefficient, the Pearson’s Correlation Coefficient between samples from two different batches.

You can install the development version of doppelgangerIdentifier from GitHub with:

# install.packages("devtools")
devtools::install_github("lr98769/doppelgangerIdentifier")

There are 4 main functions in this package:

Finds PPCC data doppelgangers in the data using batch, class and patient id meta data.

*Note: The effectiveness of getPPCCDoppelganger is affected by the efficacy of the sva::ComBat. Differences in the distribution of classes between batches affects the effectiveness of ComBat and as a result PPCC doppelganger identification.

library(doppelgangerIdentifier)
ppccDoppelgangerResults = getPPCCDoppelgangers(raw_data, meta_data)

Shows the distribution of PPCCs of different sample pairs.

library(doppelgangerIdentifier)
visualisePPCCDoppelgangers(ppccDoppelgangerResults)

Tests inflationary effects of the PPCC data doppelganger.

Note: Refer to the documentation for the format of the experiment plan file.

library(doppelgangerIdentifier)
veri_result = verifyDoppelgangers(experimentPlanFilename, raw_data, meta_data)

Visualise the accuracy of each Train-Valid Pair.

library(doppelgangerIdentifier)
visualiseVerificationResults(veri_result)

4 unprocessed data sets (no batch correction carried out) and their meta data are available and ready to use with the doppelgangerIdentifer R package.

Note: Cite the original source of each data set used

In this example, we will be showing how PPCC data doppelgangers can be identified and verified for functionality with the doppelgangerIdentifier r package.

library("doppelgangerIdentifier")

Doppelganger effect: When training and validation data are similar by chance, resulting in an inflation of model accuracies on the validation dataset regardless of how we train the model.

To illustrate the impacts of the Doppelganger effect, we will be using a Renal Carcinoma (RC) gene expression dataset.

#Import RC gene expression dataset
data(rc)
#Import metadata for RC gene expression dataset
data(rc_metadata)

Functional Doppelgangers: Sample pairs between training and validation datasets that cause doppelganger effect.

Identifying doppelgangers

When functional doppelgangers are found in both training and validation sets, the doppelganger effect is observed. Hence, it is important to identify these doppelgangers and prevent the doppelganger effect from inflating machine learning performance.

We define possible doppelgangers as samples of the same class (Both samples from Tumor or both samples from Normal) but from different patients. Sample pairs of different class would be used as negative controls while sample pairs of the same class and same patient, indicative of leakage, would be used as positive controls.

Identifying Data Doppelgangers

Data Doppelgangers: Sample pairs of the same class that are highly similar and hence have a high chance of being functional doppelgangers

Pairwise Pearson’s Correlation Coefficient: Pearson’s Correlation Coefficeint between sample pairs

Since it is computationally tedious to test different subsets of the data that cause the doppelganger effect, we instead identify data doppelgangers, sample pairs that are highly similar and have a high probability of being functional doppelgangers. In our implementation, we utilized Pairwise Pearson’s Correlation Coefficient (PPCC) as a metric of similarity and define data doppelgangers identified by this method as PPCC data doppelgangers.

In section “3) Effects of functional doppelgangers in machine learning”, we will demonstrate that the PPCC data doppelgangers identified by this method are functional doppelgangers.

To show how PPCC data doppelgangers are identified with the RC data set, we treat each batch as a separate data set and try to find PPCC data doppelgangers between the 2 batches.

These are the steps we use to identify PPCC data doppelgangers:

PPCC: Pairwise Pearson’s Correlation Coefficient

start_time = Sys.time()
ppccDoppelgangerResults = getPPCCDoppelgangers(rc, rc_metadata)
#> [1] "1. Batch correcting the 2 data sets with sva:ComBat..."
#> Found2batches
#> Adjusting for0covariate(s) or covariate level(s)
#> Standardizing Data across genes
#> Fitting L/S model and finding priors
#> Finding parametric adjustments
#> Adjusting the Data
#> [1] "- Data is not min-max normalized"
#> [1] "2. Calculating PPCC between samples of each batch..."
#>   |                                                                              |=                                                                     |   1%  |                                                                              |                                                                      |   0%  |                                                                              |                                                                      |   1%  |                                                                              |=                                                                     |   1%  |                                                                              |=                                                                     |   2%  |                                                                              |==                                                                    |   2%  |                                                                              |==                                                                    |   3%  |                                                                              |===                                                                   |   4%  |                                                                              |===                                                                   |   5%  |                                                                              |====                                                                  |   5%  |                                                                              |====                                                                  |   6%  |                                                                              |=====                                                                 |   6%  |                                                                              |=====                                                                 |   7%  |                                                                              |=====                                                                 |   8%  |                                                                              |======                                                                |   8%  |                                                                              |======                                                                |   9%  |                                                                              |=======                                                               |  10%  |                                                                              |========                                                              |  11%  |                                                                              |========                                                              |  12%  |                                                                              |=========                                                             |  12%  |                                                                              |=========                                                             |  13%  |                                                                              |==========                                                            |  14%  |                                                                              |==========                                                            |  15%  |                                                                              |===========                                                           |  15%  |                                                                              |===========                                                           |  16%  |                                                                              |============                                                          |  17%  |                                                                              |============                                                          |  18%  |                                                                              |=============                                                         |  18%  |                                                                              |=============                                                         |  19%  |                                                                              |==============                                                        |  19%  |                                                                              |==============                                                        |  20%  |                                                                              |==============                                                        |  21%  |                                                                              |===============                                                       |  21%  |                                                                              |===============                                                       |  22%  |                                                                              |================                                                      |  22%  |                                                                              |================                                                      |  23%  |                                                                              |=================                                                     |  24%  |                                                                              |=================                                                     |  25%  |                                                                              |==================                                                    |  25%  |                                                                              |==================                                                    |  26%  |                                                                              |===================                                                   |  27%  |                                                                              |===================                                                   |  28%  |                                                                              |====================                                                  |  28%  |                                                                              |====================                                                  |  29%  |                                                                              |=====================                                                 |  29%  |                                                                              |=====================                                                 |  30%  |                                                                              |=====================                                                 |  31%  |                                                                              |======================                                                |  31%  |                                                                              |======================                                                |  32%  |                                                                              |=======================                                               |  32%  |                                                                              |=======================                                               |  33%  |                                                                              |========================                                              |  34%  |                                                                              |========================                                              |  35%  |                                                                              |=========================                                             |  35%  |                                                                              |=========================                                             |  36%  |                                                                              |==========================                                            |  37%  |                                                                              |==========================                                            |  38%  |                                                                              |===========================                                           |  38%  |                                                                              |===========================                                           |  39%  |                                                                              |============================                                          |  40%  |                                                                              |=============================                                         |  41%  |                                                                              |=============================                                         |  42%  |                                                                              |==============================                                        |  42%  |                                                                              |==============================                                        |  43%  |                                                                              |==============================                                        |  44%  |                                                                              |===============================                                       |  44%  |                                                                              |===============================                                       |  45%  |                                                                              |================================                                      |  45%  |                                                                              |================================                                      |  46%  |                                                                              |=================================                                     |  47%  |                                                                              |=================================                                     |  48%  |                                                                              |==================================                                    |  48%  |                                                                              |==================================                                    |  49%  |                                                                              |===================================                                   |  49%  |                                                                              |===================================                                   |  50%  |                                                                              |===================================                                   |  51%  |                                                                              |====================================                                  |  51%  |                                                                              |====================================                                  |  52%  |                                                                              |=====================================                                 |  52%  |                                                                              |=====================================                                 |  53%  |                                                                              |======================================                                |  54%  |                                                                              |======================================                                |  55%  |                                                                              |=======================================                               |  55%  |                                                                              |=======================================                               |  56%  |                                                                              |========================================                              |  56%  |                                                                              |========================================                              |  57%  |                                                                              |========================================                              |  58%  |                                                                              |=========================================                             |  58%  |                                                                              |=========================================                             |  59%  |                                                                              |==========================================                            |  60%  |                                                                              |===========================================                           |  61%  |                                                                              |===========================================                           |  62%  |                                                                              |============================================                          |  62%  |                                                                              |============================================                          |  63%  |                                                                              |=============================================                         |  64%  |                                                                              |=============================================                         |  65%  |                                                                              |==============================================                        |  65%  |                                                                              |==============================================                        |  66%  |                                                                              |===============================================                       |  67%  |                                                                              |===============================================                       |  68%  |                                                                              |================================================                      |  68%  |                                                                              |================================================                      |  69%  |                                                                              |=================================================                     |  69%  |                                                                              |=================================================                     |  70%  |                                                                              |=================================================                     |  71%  |                                                                              |==================================================                    |  71%  |                                                                              |==================================================                    |  72%  |                                                                              |===================================================                   |  72%  |                                                                              |===================================================                   |  73%  |                                                                              |====================================================                  |  74%  |                                                                              |====================================================                  |  75%  |                                                                              |=====================================================                 |  75%  |                                                                              |=====================================================                 |  76%  |                                                                              |======================================================                |  77%  |                                                                              |======================================================                |  78%  |                                                                              |=======================================================               |  78%  |                                                                              |=======================================================               |  79%  |                                                                              |========================================================              |  79%  |                                                                              |========================================================              |  80%  |                                                                              |========================================================              |  81%  |                                                                              |=========================================================             |  81%  |                                                                              |=========================================================             |  82%  |                                                                              |==========================================================            |  82%  |                                                                              |==========================================================            |  83%  |                                                                              |===========================================================           |  84%  |                                                                              |===========================================================           |  85%  |                                                                              |============================================================          |  85%  |                                                                              |============================================================          |  86%  |                                                                              |=============================================================         |  87%  |                                                                              |=============================================================         |  88%  |                                                                              |==============================================================        |  88%  |                                                                              |==============================================================        |  89%  |                                                                              |===============================================================       |  90%  |                                                                              |================================================================      |  91%  |                                                                              |================================================================      |  92%  |                                                                              |=================================================================     |  92%  |                                                                              |=================================================================     |  93%  |                                                                              |=================================================================     |  94%  |                                                                              |==================================================================    |  94%  |                                                                              |==================================================================    |  95%  |                                                                              |===================================================================   |  95%  |                                                                              |===================================================================   |  96%  |                                                                              |====================================================================  |  97%  |                                                                              |====================================================================  |  98%  |                                                                              |===================================================================== |  98%  |                                                                              |===================================================================== |  99%  |                                                                              |======================================================================|  99%  |                                                                              |======================================================================| 100%
#> [1] "3. Labelling Sample Pairs according to their Class and Patient Similarities..."
#> [1] "4. Calculating PPCC cut off to identify PPCC data doppelgangers..."
#> [1] "5. Identifying PPCC data doppelgangers..."
end_time = Sys.time()
end_time-start_time
#> Time difference of 4.11564 secs

The functions above carry out step 1-5 and output the results into a list containing the following elements:

Processed_data: Data set before PPCC calculation (Batch corrected and/or min-max normalised)

View(ppccDoppelgangerResults$Processed_data)

PPCC_matrix: Matrix of PPCC between samples of different batch (NumberOfSamplesInBatch1*NumberofSamplesInBatch2)

View(ppccDoppelgangerResults$PPCC_matrix)

PPCC_df: Data frame of PPCC between samples of different batch (NumberOfSamplePairs*5). The columns of the data frame are as follows:
Sample1: Name of first sample of the pair (From first batch)
Sample2: Name of second sample of the pair (From second batch)
PPCC: Pearson’s correlation coefficient of this sample pair
ClassPatient: Label for the sample pair according to class and patient id similarity
DoppelgangerLabel: Labels the sample pair as a PPCC data doppelganger or not

View(ppccDoppelgangerResults$PPCC_df)

cut_off: Cut off PPCC for the identification of PPCC data doppelgangers as sample pairs between the same class and different patient with PPCC greater than cut off point

paste("PPCC cut off:", ppccDoppelgangerResults$cut_off)
#> [1] "PPCC cut off: 0.922552571814869"

To visualize the PPCC data doppelgangers, we pass the ppccDoppelgangerResults (output list of getPPCCDoppelgangers) to the visualisePPCCDoppelgangers function.

visualisePPCCDoppelgangers(ppccDoppelgangerResults)

When functional doppelgangers are in both training and validation datasets, an inflation in accuracy on the validation data set regardless of how we train the model would be observed.

We show that the PPCC data doppelgangers found above cause the doppelganger effect when included in both training and validation sets with the following steps:

start_time = Sys.time()
verificationResults = verifyDoppelgangers(
  "tutorial/experiment_plans/rc_ex_plan.csv", rc, rc_metadata)
#> [1] "1. Loading Experiment Plan..."
#> [1] "2. Preprocessing data..."
#> [1] "- Batch correcting with sva:ComBat..."
#> Found2batches
#> Adjusting for0covariate(s) or covariate level(s)
#> Standardizing Data across genes
#> Fitting L/S model and finding priors
#> Finding parametric adjustments
#> Adjusting the Data
#> [1] "- Carrying out min-max normalisation"
#> [1] "3. Generating Feature Sets..."
#> [1] "4. Training KNN models..."
#>   |                                                                              |=                                                                     |   1%  |                                                                              |==                                                                    |   3%  |                                                                              |===                                                                   |   4%  |                                                                              |====                                                                  |   6%  |                                                                              |=====                                                                 |   7%  |                                                                              |======                                                                |   8%  |                                                                              |=======                                                               |  10%  |                                                                              |========                                                              |  11%  |                                                                              |=========                                                             |  12%  |                                                                              |==========                                                            |  14%  |                                                                              |===========                                                           |  15%  |                                                                              |============                                                          |  17%  |                                                                              |=============                                                         |  18%  |                                                                              |==============                                                        |  19%  |                                                                              |===============                                                       |  21%  |                                                                              |================                                                      |  22%  |                                                                              |=================                                                     |  24%  |                                                                              |==================                                                    |  25%  |                                                                              |==================                                                    |  26%  |                                                                              |===================                                                   |  28%  |                                                                              |====================                                                  |  29%  |                                                                              |=====================                                                 |  31%  |                                                                              |======================                                                |  32%  |                                                                              |=======================                                               |  33%  |                                                                              |========================                                              |  35%  |                                                                              |=========================                                             |  36%  |                                                                              |==========================                                            |  38%  |                                                                              |===========================                                           |  39%  |                                                                              |============================                                          |  40%  |                                                                              |=============================                                         |  42%  |                                                                              |==============================                                        |  43%  |                                                                              |===============================                                       |  44%  |                                                                              |================================                                      |  46%  |                                                                              |=================================                                     |  47%  |                                                                              |==================================                                    |  49%  |                                                                              |===================================                                   |  50%  |                                                                              |====================================                                  |  51%  |                                                                              |=====================================                                 |  53%  |                                                                              |======================================                                |  54%  |                                                                              |=======================================                               |  56%  |                                                                              |========================================                              |  57%  |                                                                              |=========================================                             |  58%  |                                                                              |==========================================                            |  60%  |                                                                              |===========================================                           |  61%  |                                                                              |============================================                          |  62%  |                                                                              |=============================================                         |  64%  |                                                                              |==============================================                        |  65%  |                                                                              |===============================================                       |  67%  |                                                                              |================================================                      |  68%  |                                                                              |=================================================                     |  69%  |                                                                              |==================================================                    |  71%  |                                                                              |===================================================                   |  72%  |                                                                              |====================================================                  |  74%  |                                                                              |====================================================                  |  75%  |                                                                              |=====================================================                 |  76%  |                                                                              |======================================================                |  78%  |                                                                              |=======================================================               |  79%  |                                                                              |========================================================              |  81%  |                                                                              |=========================================================             |  82%  |                                                                              |==========================================================            |  83%  |                                                                              |===========================================================           |  85%  |                                                                              |============================================================          |  86%  |                                                                              |=============================================================         |  88%  |                                                                              |==============================================================        |  89%  |                                                                              |===============================================================       |  90%  |                                                                              |================================================================      |  92%  |                                                                              |=================================================================     |  93%  |                                                                              |==================================================================    |  94%  |                                                                              |===================================================================   |  96%  |                                                                              |====================================================================  |  97%  |                                                                              |===================================================================== |  99%  |                                                                              |======================================================================| 100%
end_time = Sys.time()
end_time-start_time
#> Time difference of 0.9779961 secs

The above functions carry out the experiment plan in experimentPlan.csv and return the results in a list. The following are the elements in the list:

combat_minmax: The gene expression data set after min-max normalization.

View(verificationResults$combat_minmax)

feature_sets: The 10 randomly generated feature sets and 2 feature set containing features of lowest and highest variance.

View(verificationResults$feature_sets)

accuracy_mat: The accuracies of each training_validation and feature set pair in matrix format

View(verificationResults$accuracy_mat)

accuracy_df: The accuracies of each training_validation and feature set pair in dataframe form

View(verificationResults$accuracy_df)

In our current experiment plan, there are 6 training-validation data set pairs:

0 Doppel: 0 Doppelgangers in validation
2 Doppel: 2 Doppelgangers in validation
4 Doppel: 4 Doppelgangers in validation
6 Doppel: 6 Doppelgangers in validation
8 Doppel: 8 Doppelgangers in validation
Perfect Leakage: 8 Samples in training are duplicated into validation

The negative control, Binomial, does not require any form of training since it is the accuracy generated by 12 (number of feature sets) binomial distributions with N = 8 (because there are eight samples in the validation set) and P = 0.5 (probability of guessing the correct label for each validation sample).

The increasing number of doppelgangers in validation is used to illustrate the dosage dependent behaviour of doppelgangers.

Here we load in the experiment plan from a comma separated file. The experiment plan specifies the names of samples in each training set and validation set. Care has been taken to prevent any leakage between training and validation sets of 0-8 Doppel.

To visualize the effect of the PPCC data doppelgangers on validation accuracy, we pass the functionalityResults (output list of doppelgangerFunctionalityVerification) to the displayFunctionalityResults function.

ori_train_valid_names = c("Doppel_0","Doppel_2", "Doppel_4", "Doppel_6", "Doppel_8", "Neg_Con", "Pos_Con")

new_train_valid_names = c("0 Doppel", "2 Doppel", "4 Doppel", "6 Doppel", "8 Doppel", "Binomial", "Perfect Leakage")

visualiseVerificationResults(verificationResults, 
                            ori_train_valid_names, 
                            new_train_valid_names)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

We observe a dosage dependent relationship between the number of doppelgangers and the accuracy of models on the validation set since accuracy increases as the number of doppelgangers in validation set increases.

In this tutorial, we will be demonstrating how functional doppelgangers can be identfiied in a RNA-Seq data set.

library("doppelgangerIdentifier")

The loaded data set is a preprocessed subset of the GSE81538 RNA-Seq data set. Details of the preprocessing steps are detailed in “./tutorial/dataset/dataset_preprocessing_information.txt”.

bc = readRDS("tutorial/dataset/bc_her2_tut.rds")
bc_meta = readRDS("tutorial/dataset/bc_her2_meta_tut.rds")

Since the data set used is an RNA-Seq data set, Combat-Seq will be used as the batch correction method prior to PPCC value calculation.

# Get PPCC Data Doppelgangers
start_time = Sys.time()
doppel_bc = getPPCCDoppelgangers(
  raw_data = bc, 
  meta_data = bc_meta,
  do_batch_corr = TRUE,
  do_min_max = TRUE,
  batch_corr_method = "ComBat_seq"
)
#> [1] "1. Batch correcting the 2 data sets with sva:ComBat_seq..."
#> Found 2 batches
#> Using null model in ComBat-seq.
#> Adjusting for 0 covariate(s) or covariate level(s)
#> Estimating dispersions
#> Fitting the GLM model
#> Shrinkage off - using GLM estimates for parameters
#> Adjusting the data
#> [1] "- Data is min-max normalized"
#> [1] "2. Calculating PPCC between samples of each batch..."
#>   |                                                                              |=                                                                     |   1%  |                                                                              |                                                                      |   0%  |                                                                              |                                                                      |   1%  |                                                                              |=                                                                     |   1%  |                                                                              |=                                                                     |   2%  |                                                                              |==                                                                    |   2%  |                                                                              |==                                                                    |   3%  |                                                                              |==                                                                    |   4%  |                                                                              |===                                                                   |   4%  |                                                                              |===                                                                   |   5%  |                                                                              |====                                                                  |   5%  |                                                                              |====                                                                  |   6%  |                                                                              |=====                                                                 |   6%  |                                                                              |=====                                                                 |   7%  |                                                                              |=====                                                                 |   8%  |                                                                              |======                                                                |   8%  |                                                                              |======                                                                |   9%  |                                                                              |=======                                                               |   9%  |                                                                              |=======                                                               |  10%  |                                                                              |=======                                                               |  11%  |                                                                              |========                                                              |  11%  |                                                                              |========                                                              |  12%  |                                                                              |=========                                                             |  12%  |                                                                              |=========                                                             |  13%  |                                                                              |=========                                                             |  14%  |                                                                              |==========                                                            |  14%  |                                                                              |==========                                                            |  15%  |                                                                              |===========                                                           |  15%  |                                                                              |===========                                                           |  16%  |                                                                              |============                                                          |  16%  |                                                                              |============                                                          |  17%  |                                                                              |============                                                          |  18%  |                                                                              |=============                                                         |  18%  |                                                                              |=============                                                         |  19%  |                                                                              |==============                                                        |  19%  |                                                                              |==============                                                        |  20%  |                                                                              |==============                                                        |  21%  |                                                                              |===============                                                       |  21%  |                                                                              |===============                                                       |  22%  |                                                                              |================                                                      |  22%  |                                                                              |================                                                      |  23%  |                                                                              |================                                                      |  24%  |                                                                              |=================                                                     |  24%  |                                                                              |=================                                                     |  25%  |                                                                              |==================                                                    |  25%  |                                                                              |==================                                                    |  26%  |                                                                              |===================                                                   |  26%  |                                                                              |===================                                                   |  27%  |                                                                              |===================                                                   |  28%  |                                                                              |====================                                                  |  28%  |                                                                              |====================                                                  |  29%  |                                                                              |=====================                                                 |  29%  |                                                                              |=====================                                                 |  30%  |                                                                              |=====================                                                 |  31%  |                                                                              |======================                                                |  31%  |                                                                              |======================                                                |  32%  |                                                                              |=======================                                               |  32%  |                                                                              |=======================                                               |  33%  |                                                                              |=======================                                               |  34%  |                                                                              |========================                                              |  34%  |                                                                              |========================                                              |  35%  |                                                                              |=========================                                             |  35%  |                                                                              |=========================                                             |  36%  |                                                                              |==========================                                            |  36%  |                                                                              |==========================                                            |  37%  |                                                                              |==========================                                            |  38%  |                                                                              |===========================                                           |  38%  |                                                                              |===========================                                           |  39%  |                                                                              |============================                                          |  39%  |                                                                              |============================                                          |  40%  |                                                                              |============================                                          |  41%  |                                                                              |=============================                                         |  41%  |                                                                              |=============================                                         |  42%  |                                                                              |==============================                                        |  42%  |                                                                              |==============================                                        |  43%  |                                                                              |==============================                                        |  44%  |                                                                              |===============================                                       |  44%  |                                                                              |===============================                                       |  45%  |                                                                              |================================                                      |  45%  |                                                                              |================================                                      |  46%  |                                                                              |=================================                                     |  46%  |                                                                              |=================================                                     |  47%  |                                                                              |=================================                                     |  48%  |                                                                              |==================================                                    |  48%  |                                                                              |==================================                                    |  49%  |                                                                              |===================================                                   |  49%  |                                                                              |===================================                                   |  50%  |                                                                              |===================================                                   |  51%  |                                                                              |====================================                                  |  51%  |                                                                              |====================================                                  |  52%  |                                                                              |=====================================                                 |  52%  |                                                                              |=====================================                                 |  53%  |                                                                              |=====================================                                 |  54%  |                                                                              |======================================                                |  54%  |                                                                              |======================================                                |  55%  |                                                                              |=======================================                               |  55%  |                                                                              |=======================================                               |  56%  |                                                                              |========================================                              |  56%  |                                                                              |========================================                              |  57%  |                                                                              |========================================                              |  58%  |                                                                              |=========================================                             |  58%  |                                                                              |=========================================                             |  59%  |                                                                              |==========================================                            |  59%  |                                                                              |==========================================                            |  60%  |                                                                              |==========================================                            |  61%  |                                                                              |===========================================                           |  61%  |                                                                              |===========================================                           |  62%  |                                                                              |============================================                          |  62%  |                                                                              |============================================                          |  63%  |                                                                              |============================================                          |  64%  |                                                                              |=============================================                         |  64%  |                                                                              |=============================================                         |  65%  |                                                                              |==============================================                        |  65%  |                                                                              |==============================================                        |  66%  |                                                                              |===============================================                       |  66%  |                                                                              |===============================================                       |  67%  |                                                                              |===============================================                       |  68%  |                                                                              |================================================                      |  68%  |                                                                              |================================================                      |  69%  |                                                                              |=================================================                     |  69%  |                                                                              |=================================================                     |  70%  |                                                                              |=================================================                     |  71%  |                                                                              |==================================================                    |  71%  |                                                                              |==================================================                    |  72%  |                                                                              |===================================================                   |  72%  |                                                                              |===================================================                   |  73%  |                                                                              |===================================================                   |  74%  |                                                                              |====================================================                  |  74%  |                                                                              |====================================================                  |  75%  |                                                                              |=====================================================                 |  75%  |                                                                              |=====================================================                 |  76%  |                                                                              |======================================================                |  76%  |                                                                              |======================================================                |  77%  |                                                                              |======================================================                |  78%  |                                                                              |=======================================================               |  78%  |                                                                              |=======================================================               |  79%  |                                                                              |========================================================              |  79%  |                                                                              |========================================================              |  80%  |                                                                              |========================================================              |  81%  |                                                                              |=========================================================             |  81%  |                                                                              |=========================================================             |  82%  |                                                                              |==========================================================            |  82%  |                                                                              |==========================================================            |  83%  |                                                                              |==========================================================            |  84%  |                                                                              |===========================================================           |  84%  |                                                                              |===========================================================           |  85%  |                                                                              |============================================================          |  85%  |                                                                              |============================================================          |  86%  |                                                                              |=============================================================         |  86%  |                                                                              |=============================================================         |  87%  |                                                                              |=============================================================         |  88%  |                                                                              |==============================================================        |  88%  |                                                                              |==============================================================        |  89%  |                                                                              |===============================================================       |  89%  |                                                                              |===============================================================       |  90%  |                                                                              |===============================================================       |  91%  |                                                                              |================================================================      |  91%  |                                                                              |================================================================      |  92%  |                                                                              |=================================================================     |  92%  |                                                                              |=================================================================     |  93%  |                                                                              |=================================================================     |  94%  |                                                                              |==================================================================    |  94%  |                                                                              |==================================================================    |  95%  |                                                                              |===================================================================   |  95%  |                                                                              |===================================================================   |  96%  |                                                                              |====================================================================  |  96%  |                                                                              |====================================================================  |  97%  |                                                                              |====================================================================  |  98%  |                                                                              |===================================================================== |  98%  |                                                                              |===================================================================== |  99%  |                                                                              |======================================================================|  99%  |                                                                              |======================================================================| 100%
#> [1] "3. Labelling Sample Pairs according to their Class and Patient Similarities..."
#> [1] "4. Calculating PPCC cut off to identify PPCC data doppelgangers..."
#> [1] "5. Identifying PPCC data doppelgangers..."
end_time = Sys.time()
end_time - start_time
#> Time difference of 1.783855 mins

visualisePPCCDoppelgangers(doppel_bc)

To find out if the identified PPCC DDs are functional doppelgangers, we create a experiment plan that incrementally increases the number of PPCC DD samples in the validation set. If we observe an increasing trend of random model accuracy with increasing number of PPCC DD samples in the validation set, then we can conclude that the identified PPCC DDs are FDs.

start_time = Sys.time()
veri_bc = verifyDoppelgangers(
  experiment_plan_filename = "tutorial/experiment_plans/bc_ex_plan.csv",
  raw_data = bc,
  meta_data = bc_meta,
  batch_corr_method = "ComBat_seq",
  k=9,
  size_of_val_set = 48,
  feature_set_portion = 0.01
)
#> [1] "1. Loading Experiment Plan..."
#> [1] "2. Preprocessing data..."
#> [1] "- Batch correcting with sva:ComBat_seq..."
#> Found 2 batches
#> Using null model in ComBat-seq.
#> Adjusting for 0 covariate(s) or covariate level(s)
#> Estimating dispersions
#> Fitting the GLM model
#> Shrinkage off - using GLM estimates for parameters
#> Adjusting the data
#> [1] "- Carrying out min-max normalisation"
#> [1] "3. Generating Feature Sets..."
#> [1] "4. Training KNN models..."
#>   |                                                                              |=                                                                     |   1%  |                                                                              |==                                                                    |   3%  |                                                                              |===                                                                   |   4%  |                                                                              |====                                                                  |   6%  |                                                                              |=====                                                                 |   7%  |                                                                              |======                                                                |   8%  |                                                                              |=======                                                               |  10%  |                                                                              |========                                                              |  11%  |                                                                              |=========                                                             |  12%  |                                                                              |==========                                                            |  14%  |                                                                              |===========                                                           |  15%  |                                                                              |============                                                          |  17%  |                                                                              |=============                                                         |  18%  |                                                                              |==============                                                        |  19%  |                                                                              |===============                                                       |  21%  |                                                                              |================                                                      |  22%  |                                                                              |=================                                                     |  24%  |                                                                              |==================                                                    |  25%  |                                                                              |==================                                                    |  26%  |                                                                              |===================                                                   |  28%  |                                                                              |====================                                                  |  29%  |                                                                              |=====================                                                 |  31%  |                                                                              |======================                                                |  32%  |                                                                              |=======================                                               |  33%  |                                                                              |========================                                              |  35%  |                                                                              |=========================                                             |  36%  |                                                                              |==========================                                            |  38%  |                                                                              |===========================                                           |  39%  |                                                                              |============================                                          |  40%  |                                                                              |=============================                                         |  42%  |                                                                              |==============================                                        |  43%  |                                                                              |===============================                                       |  44%  |                                                                              |================================                                      |  46%  |                                                                              |=================================                                     |  47%  |                                                                              |==================================                                    |  49%  |                                                                              |===================================                                   |  50%  |                                                                              |====================================                                  |  51%  |                                                                              |=====================================                                 |  53%  |                                                                              |======================================                                |  54%  |                                                                              |=======================================                               |  56%  |                                                                              |========================================                              |  57%  |                                                                              |=========================================                             |  58%  |                                                                              |==========================================                            |  60%  |                                                                              |===========================================                           |  61%  |                                                                              |============================================                          |  62%  |                                                                              |=============================================                         |  64%  |                                                                              |==============================================                        |  65%  |                                                                              |===============================================                       |  67%  |                                                                              |================================================                      |  68%  |                                                                              |=================================================                     |  69%  |                                                                              |==================================================                    |  71%  |                                                                              |===================================================                   |  72%  |                                                                              |====================================================                  |  74%  |                                                                              |====================================================                  |  75%  |                                                                              |=====================================================                 |  76%  |                                                                              |======================================================                |  78%  |                                                                              |=======================================================               |  79%  |                                                                              |========================================================              |  81%  |                                                                              |=========================================================             |  82%  |                                                                              |==========================================================            |  83%  |                                                                              |===========================================================           |  85%  |                                                                              |============================================================          |  86%  |                                                                              |=============================================================         |  88%  |                                                                              |==============================================================        |  89%  |                                                                              |===============================================================       |  90%  |                                                                              |================================================================      |  92%  |                                                                              |=================================================================     |  93%  |                                                                              |==================================================================    |  94%  |                                                                              |===================================================================   |  96%  |                                                                              |====================================================================  |  97%  |                                                                              |===================================================================== |  99%  |                                                                              |======================================================================| 100%
end_time = Sys.time()
end_time - start_time
#> Time difference of 1.589066 mins

ori_train_valid_names = c("Doppel_0","Doppel_6", "Doppel_12", "Doppel_18", "Doppel_24", "Neg_Con", "Pos_Con_24")

new_train_valid_names = c("0 Doppel", "6 Doppel", "12 Doppel", "18 Doppel", "24 Doppel", "Binomial", "24 Perfect Leakage")

visualiseVerificationResults(
  veri_bc,
  original_train_valid_names = ori_train_valid_names,
  new_train_valid_names = new_train_valid_names
)

Since a positive relationship between the number of PPCC DD samples and random model accuracy can be observed, we can conclude that all identified PPCC DDs are FDs.

lr98769/doppelgangerIdentifier documentation built on Aug. 2, 2022, 9:41 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

lr98769/doppelgangerIdentifier
Identifies Doppelgangers Between Datasets With PPCC And Meta data

README.md
In lr98769/doppelgangerIdentifier: Identifies Doppelgangers Between Datasets With PPCC And Meta data

doppelgangerIdentifier

Installation

Functions

1. getPPCCDoppelgangers

2. visualisePPCCDoppelgangers

3. verifyDoppelgangers

4. visualiseVerificationResults

Data Sets

Tutorial 1 (Microarray Data)

0) Importing the doppelgangerIdentifier package

1) Importing Renal Carcinoma (RC) Dataset

2) What Do Functional Doppelgangers Look Like?

Identifying doppelgangers

Identifying Data Doppelgangers

3) Effects of functional doppelgangers in machine learning

Tutorial 2 (RNA-Seq Data)

0) Importing the doppelgangerIdentifier package

1) Loading the breast cancer data set

2) Identifying doppelgangers

3) Verification of PPCC DDs

R Package Documentation

Browse R Packages

We want your feedback!

lr98769/doppelgangerIdentifier Identifies Doppelgangers Between Datasets With PPCC And Meta data

README.md In lr98769/doppelgangerIdentifier: Identifies Doppelgangers Between Datasets With PPCC And Meta data

doppelgangerIdentifier

Installation

Functions

1. getPPCCDoppelgangers

2. visualisePPCCDoppelgangers

3. verifyDoppelgangers

4. visualiseVerificationResults

Data Sets

Tutorial 1 (Microarray Data)

0) Importing the doppelgangerIdentifier package

1) Importing Renal Carcinoma (RC) Dataset

2) What Do Functional Doppelgangers Look Like?

Identifying doppelgangers

Identifying Data Doppelgangers

3) Effects of functional doppelgangers in machine learning

Tutorial 2 (RNA-Seq Data)

0) Importing the doppelgangerIdentifier package

1) Loading the breast cancer data set

2) Identifying doppelgangers

3) Verification of PPCC DDs

R Package Documentation

Browse R Packages

We want your feedback!

lr98769/doppelgangerIdentifier
Identifies Doppelgangers Between Datasets With PPCC And Meta data

README.md
In lr98769/doppelgangerIdentifier: Identifies Doppelgangers Between Datasets With PPCC And Meta data