PeakCNV consists of three steps: (1) Building CNVRs; (2) Cluster CNVRs; (3) Selection. PeakCNV first computes the genome-wide probability of each base pair to be deleted or duplicated in case and control samples using the Fisher exact test resulting in identification of statistically significant CNVRs. Then, PeakCNV groups statistically significant CNVRs into different clusters based on two combined features. The first feature (importance) shows the number of samples in each region after removing the common samples between each two CNVRs. The second one is the distance between CNVRs. Finally, the best CNVR(s) from each cluster will be report as the representative of the cluster
PeakCNV runs on R (requires at least 3.2.0). Install the package from Github using the following R commands.
install.packages("devtools")
library(devtools)
install_github("mahdieh1/PeakCNV")
Alternatively, the package may downloaded from Github and installed in R:
# Clone/download PeakCNV into the current working dirctory with the following command: git clone https://github.com/mahdieh1/PeakCNV.git
The full PeakCNV workflow includes the following 3 steps: 1. Building CNVRs 2. Clustering process 3. Selection process
By default, the PeakCNV() function runs the entire workflow. However, it is possible to run specific steps of the workflow by specifiying the run.to parameter (see full documentation in https://mahdieh1.github.io/PeakCNV/).
| Parameter | Type | Description | Default | | :---: | :---: | :---: | :---: | | run.to | Numeric | Steps to run | 1,2,3 | | working.dir | String | Working directory | current working directory
| Step | run.to | | :---: | :---: | | 1 | 1 | | 2 | 2 | | 3 | 3 |
PeakCNV builds deletion and duplication CNVR maps for cases and controls by merging genomic coordinates of deletion and duplication type of CNVs, respectively. Then, PeakCNV estimates the probability of being either deleted or duplicated in cases versus controls for each base pair of these maps using one-tailed Fisher’s exact test. The output of this step is CNVRs that are significantly more frequently duplicated or deleted in cases or controls individuals.
PeakCNV(1)
Output generated in Step 1: | Filename | Description | Required fields | | :---: | :---: | :---: | | step1 | list of positions that probability of being either deleted or duplicated in cases versus controls | Chromosome, Start position, End position |
To identify those CNVRs in the proximity of each other with the similar percentage of case samples coverage, PeakCNV performs clustering step. In this step, for each chromosome, PeakCNV asks you the eps value based on the k nearest neighbors(knn) plot. The optimal value is an elbow, where a sharp change in the distance occurs. For example, in the below image, the optimal eps value is around a distance of 0.15.
PeakCNV(2)
Output generated in Step 2: | Filename | Description | Required fields | | :---: | :---: | :---: | | Clustering | list of CNVRs with their cluster number | Chromosome, Start position, End position, sample number, cluster number |
PeakCNV selects the most independent CNVRs from each cluster.
PeakCNV(run.to = 3)
Output generated in Step 3: | Filename | Description | Required fields | | :---: | :---: | :---: | | FinalCNVRs | list of selected CNVRs | Score, Chromosome, Start position, End position, sample number, cluster number |
By default, PeakCNV runs in the current working directory unless specified by the user. By default, results will be saved in the working directory. In clustering step, for each chromosome, PeakCNV asks you the eps value based on the k nearest neighbors(knn) plot. The optimal value is an elbow, where a sharp change in the distance occurs. For more information about results see the https://mahdieh1.github.io/PeakCNV/.
library("PeakCNV")
PeakCNV()
Download the test data sets from https://github.com/mahdieh1/PeakCNV/tree/main/test-data into your working directory. For this datasets, P-value i
| Chr | Start | End | Sample-Id | | :---: | :---: | :---: | :---: | | 1 | 6742281 | 6742903 | SP7890 |
| Chr | Start | End | Sample-Id | | :---: | :---: | :---: | :---: | | 1 | 6742281 | 6742903 | sa321 |
Please put input files (case.bed and control.bed) in the working directory. If your CNV list contains chr X or Y, please replace them with 23,24.
| Chr | Start | End | | :---: | :---: | :---: | | 1 | 6742281 | 6742903 |
| Chr | Start | End | #case | Cluster-NO | | :---: | :---: | :---: | :---: | :---: | | 1 | 6742281 | 6742903 | 15 | 1 |
| Score | #chr | Start | End | #case | Cluster-NO | | :---: | :---: | :---: | :---: | :---: | :---: | | 56 | 21 | 6742281 | 6742903 | 15 | 225.86 | 0 |
Please consider citing the follow paper when you use this code.
Labani, M., Afrasiabi, A., Beheshti, A., Lovell, N. H., & Alinejad-Rokny, H. (2022). PeakCNV: A multi-feature ranking algorithm-based tool for genome-wide copy number variation-association study. Computational and Structural Biotechnology Journal, 20, 4975-4983.
https://www.sciencedirect.com/science/article/pii/S2001037022004068
}
I will be pleased to address any question or concern about the PeakCNV package: In case of queries, please email: mahdieh.labani@students.mq.edu.au
This work was funded by the UNSW Scientia Program Fellowship and the Australian Research Council Discovery Early Career Researcher Award (DECRA), Macquarie PhD Scholarship and Australian Government Research Training Program (RTP) scholarship. Analyses were made possible with High Performance Computing resources provided by the BioMedical Machine Learning Lab with funding from the Australian Government and the UNSW SYDNEY.
This package is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License, version 3, as published by the Free Software Foundation.
A copy of the GNU General Public License, version 3, is available at https://www.r-project.org/Licenses/GPL-3
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.