View source: R/getPPCCDoppelgangers.R
getPPCCDoppelgangers | R Documentation |
This function performs the following steps to identify PPCC data dopplgangers between batches:
Batch correct batches with sva::ComBat
Calculate PPCC values between samples of different batches
Label sample pairs according to their patient id and class similarities
Calculate PPCC cut off point (maximum PPCC of any "Different Class Different Patient" sample pair)
Identify PPCC Data Doppelgangers as sample pairs with "Same Class Different Patient" labels with PPCC values > PPCC cut-off.
getPPCCDoppelgangers( raw_data, meta_data, do_batch_corr = TRUE, correlation_function = cor, batch_corr_method = "ComBat", do_min_max = FALSE )
raw_data |
Data frame where each column is a sample and each row is a variable where rowname of each row is the variable name. |
meta_data |
Data frame with the columns "Class", "Patient_ID", "Batch" indicating the class, patient id and batch of the sample respectively and each row is a sample name. Ensure the sample names are row names of the data frame not a separate column in the data set. |
do_batch_corr |
If False, no batch correction is carried out before doppelgangers are found |
correlation_function |
Correlation function use. Pearson's Correlation Coefficient is used as the default correlation function. User defined functions should accept two vector parameters, x and y. |
batch_corr_method |
Batch correlation method used. Only 2 options are accepted "ComBat" or "ComBat_seq". |
do_min_max |
If True, min max normalization is carried out just before PPCC calulation |
This function also identifies PPCC data doppelgangers within a batch (if only 1 batch is detected in the metadata document). In this case it performs the following steps:
Calculate PPCC values between samples within the batch
Label sample pairs according to their patient id and class similarities
Calculate PPCC cut off point (maximum PPCC of any "Different Class Different Patient" sample pair)
Identify PPCC Data Doppelgangers as sample pairs with "Same Class Different Patient" labels with PPCC values > PPCC cut-off.
Troubleshooting Tips:
Ensure all (rownames) samples in the meta_data can be found in the colnames in the raw_data and vice versa.
A list containing the PPCC matrix and data frame and a list of doppelgangers identified
ppccDoppelgangerResults = getPPCCDoppelgangers(rc, rc_metadata)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.