FAQ.md

Frequently asked questions

Table of Contents

Where does the bias correction algorithm come from? What kind of data can be corrected by rBiasCorrection/BiasCorrector? Do my input files need to be in a special format? Are there any requirements for naming the files? What is exactly done during rBiasCorrection's/BiasCorrector's data preprocessing? What are the regression statistics? What are 'substitutions' in my final results?

Where does the bias correction algorithm come from?

rBiasCorrection/BiasCorrector is the user friendly implementation of the algorithms, described by Moskalev et. al in their article 'Correction of PCR-bias in quantitative DNA methylation studies by means of cubic polynomial regression', published 2011 in Nucleic acids research, Oxford University Press (DOI: https://doi.org/10.1093/nar/gkr213).

Citation:

@article{10.1093/nar/gkr213,
    author = {Moskalev, Evgeny A. and Zavgorodnij, Mikhail G. and Majorova, Svetlana P. and Vorobjev, Ivan A. and Jandaghi, Pouria and Bure, Irina V. and Hoheisel, Jörg D.},
    title = "{Correction of PCR-bias in quantitative DNA methylation studies by means of cubic polynomial regression}",
    journal = {Nucleic Acids Research},
    volume = {39},
    number = {11},
    pages = {e77-e77},
    year = {2011},
    month = {04},
    abstract = "{DNA methylation profiling has become an important aspect of biomedical molecular analysis. Polymerase chain reaction (PCR) amplification of bisulphite-treated DNA is a processing step that is common to many currently used methods of quantitative methylation analysis. Preferential amplification of unmethylated alleles—known as PCR-bias—may significantly affect the accuracy of quantification. To date, no universal experimental approach has been reported to overcome the problem. This study presents an effective method of correcting biased methylation data. The procedure includes a calibration performed in parallel to the analysis of the samples under investigation. DNA samples with defined degrees of methylation are analysed. The observed deviation of the experimental results from the expected values is used for calculating a regression curve. The equation of the best-fitting curve is then used for correction of the data obtained from the samples of interest. The process can be applied irrespective of the locus interrogated and the number of sites analysed, avoiding an optimization of the amplification conditions for each individual locus.}",
    issn = {0305-1048},
    doi = {10.1093/nar/gkr213},
    url = {https://dx.doi.org/10.1093/nar/gkr213},
    eprint = {http://oup.prod.sis.lan/nar/article-pdf/39/11/e77/16775711/gkr213.pdf},
}

What kind of data can be corrected by rBiasCorrection/BiasCorrector?

Currently, both R packages, rBiasCorrection and BiasCorrector, can correct measurement biases in DNA methylation data of the type "one locus in many biological samples". The programme has been tested on data derived by bisulphite pyrosequencing, next-generation sequencing, and oligonucleotide microarrays. A future implementation is planned for correcting data of the type "many loci in one biological sample". However with some effort, the latter can be transformed to data of the first type in order be corrected with BiasCorrector.

Do my input files need to be in a special format?

Yes, rBiasCorrection/BiasCorrector places very strict requirements on the file format. Below is a description of the exact requirements. All uploaded files must

(As the BiasCorrector software currently requires the data to be in the format "one experiment per Locus, multiple samples per experiment", results of high-throughput analyses that might be of different shape (e.g. one CSV file per calibration step) need to be formatted as described above in order to apply BiasCorrector to this type of data, i.e. one file CSV with the experiment results and one CSV file holding the calibration results with both files having equal column names.)

Example files

Example files are available for download, to demonstrate how to preprare files appropriately: calibration data: Example_calibration.csv experimental data: Example_experimental.csv

Template files

Template files are available, if you want to copy-paste your data. Please note that you might have to adjust the column headers and sample IDs or calibration steps: calibration data: Template_calibration.csv experimental data: Template_experimental.csv

Are there any requirements for naming the files?

The filename must not contain additional dots (".") beyond the one in the file ending.

What is exactly done during rBiasCorrection's/BiasCorrector's data preprocessing?

During the preprocessing, all requirements of the input files as stated in Do my input files need to be in a special format? are checked. Furthermore, the mean methylation percentages of all CpG columns are calculated for every provided file.

If any of the abovementioned file requirements is not met, an error will occur. For example, an error message will pop up if any calibration step is not within the range of 0 <= CS <= 100 or if you provided less than four calibration steps in your input data.

What are the regression statistics?

The regression statistics table shows the regression parameters of the hyperbolic and the cubic polynomial regression.

What are 'substitutions' in my final results?

Substitutions occur if no result is found in the range of plausible values between 0 and 100 during the BiasCorrection. A 'border zone' is implemented in the ranges 0 – 10% and 100 + 10%. If a result is in the range -10% < x < 0% or 100% < x < 110% , the value is substituted in the final results with 0% or 100%, respectively. Values beyond these border zones will be substituted with a blank value in the final output, as they seem implausible and could indicate substantial errors in the underlying data. For a detailed feedback, the substitutions table shows the results of the algorithm 'BiasCorrected value' and the corresponding substitution 'Substituted value' for the respective CpG site.



kapsner/PCRBiasCorrection documentation built on June 15, 2025, 4:14 a.m.