zzz: Welcome to patchwork

Description Details Author(s)

Description

patchwork allows you to obtain allele-specific copy number information from BAM files.

To use patchwork on CompleteGenomics data download patchworkCG.

////////////////////////////////////////////////
http://patchwork.r-forge.r-project.org/
////////////////////////////////////////////////

It is highly recommended that you visit the webpage for this project as it contains the most recently updated information and instructions regarding everything patchwork.

Details

////////////////////////////
Installation Guide:
////////////////////////////

For patchwork to run correctly, or even install, you will need to install DNACopy from Bioconductor.

Start R (as of writing version 2.14.2)

Text with a ">" infront is R executions.

> source("http://bioconductor.org/biocLite.R")
> biocLite("DNAcopy")

After DNAcopy is installed you can install patchwork and patchworkData using these commands:

> install.packages("patchworkData", repos="http://R-Forge.R-project.org")
> install.packages("patchwork", repos="http://R-Forge.R-project.org")

If for some reason that does not work add the 'type="source"' to it, as so:

> install.packages("patchworkData", repos="http://R-Forge.R-project.org",type="source")
> install.packages("patchwork", repos="http://R-Forge.R-project.org",type="source")

The source codes can also be downloaded outside of R at https://r-forge.r-project.org/R/?group_id=1250

////////////////////////////
Tutorial:
////////////////////////////

/////////////////////
Requirements:
/////////////////////

-A working R installation.

-Your samples sorted BAM and BAI files:
To get the BAI file run "samtools index <yourfile>.bam"
To sort a bamfile "samtools sort <yourfile>.bam <sortedfile>.bam"

-A pileup of your sample.
To do this step you will need a reference human genome fasta file, for example HumanGenome19.fasta.

At the time of writing there should be several places you can find this; The Sanger institute, 1000genomes project etc. You probably already have one as you needed one previously for the alignment of your tumor sequence.

Use SAMtools, version 0.1.18 or older, to produce the pileup:
"samtools pileup -vcf <reference.fasta> <yourfile>.bam > <yourpileup>"

The pileup should have this format:

less pileup

chr1 10179 c W 0 0 60 5 ,,tna ##6!3
chr1 10180 t W 6 6 60 5 ,,,aa ##369
chr1 10377 a R 0 1 60 1 g @
chr1 11391 t A 0 3 60 1 a B
chr1 18592 C Y 0 3 60 1 t B
chr1 23359 c S 0 2 60 1 g A
chr1 24067 A R 0 2 60 1 G A
chr1 30315 G C 3 24 60 2 cC =;
chr1 92200 a W 0 3 60 1 t B
chr1 96592 T C 0 3 60 1 C B
chr1 100140 a M 0 1 60 1 C @
chr1 104697 g K 0 2 60 1 T A
chr1 127285 A R 0 3 60 1 g B

-(optional) A matched normal sample to your tumor in BAM
-(optional) A pileup of your normal sample BAM
-(optional) A standard Reference file. (Illumina/Solexa, SOLiD or your own)

What we mean by this (optional) tag is that you do not need to have all these files. There is really no point in using all of them.

If you have a matched normal sample you should make a pileup of this and use those two arguments.

In some cases it may be better to use the pileup and a reference file.

You can also run patchwork.plot with only a reference file. See patchwork.createreference for information about the reference file.

///////////////////////////////////////////////
Execution: patchwork.createreference()
///////////////////////////////////////////////

This function creates a reference using a pool of samples of your selection. Choosing the samples to use for reference creation you should consider these factors:
- They should be sequenced using the same technique
- They should be from the same organism
- They should be non-tumerous

It is recommended that you use at least 3 bam files to create your own reference.

Start R

Load the patchwork and patchworkData libraries

> library(patchwork)
> library(patchworkData)

Read the amazing documentation for patchwork.createreference()

> ?patchwork.createreference

Execute the function, pointing to your desired files.

>patchwork.createreference("file1","path/to/file2","../file3","~/heres/file4",output="REFOUT")

This will generate REFOUT.Rdata, or whichever prefix you chose, in your working directory. Use this file for the reference argument of patchwork.plot().

////////////////////////////////////
Python and Pysam
////////////////////////////////////

Patchwork uses a python mod called pysam for chromosome reading. The package already checks if you have it installed and otherwise installs it for you the first time you run patchwork however if you lack python or python development on your system the installation fails. It seems to already be included in MAC OS X but on other systems you should install/update it.

See http://wiki.python.org/moin/BeginnersGuide/Download

////////////////////////////////////////
Execution: patchwork.plot()
////////////////////////////////////////

It is recommended that you run patchwork from a "clean" working directory. In this way you do not run the risk of having files write over eachother. If you do not want to type paths into R you may also want to put the required files in this folder.

Execution may take quite a while depending on the size of your sample! So if possible run it on a dedicated computer.

Start R

Load the patchwork and patchworkData libraries

> library(patchwork)
> library(patchworkData)

Read the excellent documentation for patchwork.plot()

> ?patchwork.plot

Excerpt from ?patchwork.plot:

////////////////////////////////////////
Usage:

patchwork.plot(BamFile,Pileup,reference=NULL,normal.bam=NULL,normal.pileup=NULL,Alpha=0.0001,SD=1)

Arguments:

BamFile: Aligned, Sorted Bam-file.

Pileup: Pileup file generated through samtools -vcf reference.fasta bamfile > outfile.

reference: Default is NULL. Path to a reference file that can be created using patchwork.createreference().

normal.bam: Default is NULL. The matched normal sample of the your BamFile.

normal.pileup: Default is NULL. The pileup of your normal sample.

Alpha: Default 0.0001, change if you want to try to get a better segmentation from patchwork.segment(). From DNAcopy (?segment): alpha: significance levels for the test to accept change-points.

SD: Default 1, change if you want to try to get a better segmentation from patchwork.segment(). From DNAcopy (?segment): undo.SD: the number of SDs between means to keep a split if undo.splits="sdundo". ////////////////////////////////////////

Perform patchwork.plot() with desired arguments.

>patchwork.plot(BamFile="patchwork.example.bam",Pileup="patchwork.example.pileup",reference="../HCC1954/datasolexa.RData")
Initiating Allele Data Generation
Initiating Read Chromosomal Coverage
Reading chr1
Reading chr2
Reading chr3
Reading chr4
Reading chr5
Reading chr6
Reading chr7
Reading chr8
Reading chr9
Reading chrX
Reading chrY
Reading chr10
Reading chr11
Reading chr12
Reading chr13
Reading chr14
Reading chr15
Reading chr16
Reading chr17
Reading chr18
Reading chr19
Reading chr20
Reading chr21
Reading chr22
Read Chromosomal Coverage Complete
Initiating GC Content Normalization
GC Content Normalization Complete
Initiating Smoothing
Smoothing Chromosome: chr1
Smoothing Chromosome: chr2
Smoothing Chromosome: chr3
Smoothing Chromosome: chr4
Smoothing Chromosome: chr5
Smoothing Chromosome: chr6
Smoothing Chromosome: chr7
Smoothing Chromosome: chr8
Smoothing Chromosome: chr9
Smoothing Chromosome: chrX
Smoothing Chromosome: chrY
Smoothing Chromosome: chr10
Smoothing Chromosome: chr11
Smoothing Chromosome: chr12
Smoothing Chromosome: chr13
Smoothing Chromosome: chr14
Smoothing Chromosome: chr15
Smoothing Chromosome: chr16
Smoothing Chromosome: chr17
Smoothing Chromosome: chr18
Smoothing Chromosome: chr19
Smoothing Chromosome: chr20
Smoothing Chromosome: chr21
Smoothing Chromosome: chr22
Smoothing Complete
Initiating Segmentation
Note: If segmentation fails to initiate the probable reason is that you have not installed the R package DNAcopy. See the homepage, http://patchwork.r-forge.r-project.org/ , or ?patchwork.readme for installation instructions.
Analyzing: chr1.p
Analyzing: chr1.q
Analyzing: chr2.p
Analyzing: chr2.q
Analyzing: chr3.p
Analyzing: chr3.q
Analyzing: chr4.p
Analyzing: chr4.q
Analyzing: chr5.p
Analyzing: chr5.q
Analyzing: chr6.p
Analyzing: chr6.q
Analyzing: chr7.p
Analyzing: chr7.q
Analyzing: chr8.p
Analyzing: chr8.q
Analyzing: chr9.p
Analyzing: chr9.q
Analyzing: chrX.p
Analyzing: chrX.q
Analyzing: chrY.p
Analyzing: chrY.q
Analyzing: chr10.p
Analyzing: chr10.q
Analyzing: chr11.p
Analyzing: chr11.q
Analyzing: chr12.p
Analyzing: chr12.q
Analyzing: chr13.q
Analyzing: chr14.q
Analyzing: chr15.q
Analyzing: chr16.p
Analyzing: chr16.q
Analyzing: chr17.p
Analyzing: chr17.q
Analyzing: chr18.p
Analyzing: chr18.q
Analyzing: chr19.p
Analyzing: chr19.q
Analyzing: chr20.p
Analyzing: chr20.q
Analyzing: chr21.p
Analyzing: chr21.q
Analyzing: chr22.q
Segmentation Complete
Initiating Segment data extraction (Medians and AI)
Segment data extraction Complete

Saving information objects needed for patchwork.copynumbers in copynumbers.Rdata

Initiating Plotting
Plotting Complete
Shutting down.....
Warning messages:

If you did it correctly it should have generated similar output as can be seen above. Your working directory should now have the plots generated from the function, 1 overhead plot and 24 chromosomal plots. The working directory should also contain the files:
- copynumbers.Rdata
- data.Rdata
- pile.alleles
- pile.alleles.Rdata
- Segments.Rdata
- smoothed.Rdata

These were created for swifter re-runs of the function should something unforseen happen during execution. For example if something goes wrong during a final step it would be a terrible hassle to read all the chromosomes again.

!!!THIS DOES MEAN THAT TO COMPLETELY RE-RUN PATCHWORK FROM SCRATCH IN THE SAME WORKING DIRECTORY ALL OF THE ABOVE FILES NEED TO BE DELETED!!!

For some details of the plot functions see their documentation:

> ?karyotype
> ?karyotype_chroms

To interpret the plots here is an excerpt from their documentation:

///////////////////////////
karyotype()
///////////////////////////

Description:

Plots each,color coded by chromosomal coordinate, chromosome against a background of the complete genome.

Details:

Vertical axis: Allelic imbalance.
Horizontal axis: Total intensity.
The plot is a overview, for a closer look see the plots generated by karyotype_chroms().

///////////////////////////
karyotype_chroms()
///////////////////////////

Description:

Visualises the calculated data of patchwork.plot() for each chromosome.

Details:

*TOP*
Vertical axis: Allelic Imbalance Horizontal axis: Total Intensity The chromosome plotted against the complete genome background. The separation between clusters within the plot are due to the fluctuating intensity and allelic imbalance and as such display the varying allele counts and copy numbers. Longer/larger segments have bigger circles. Darker circles show more content as they are ontop of eachother.

*MIDDLE*
Vertical axis: Total Intensity Horizontal axis: Chromosomal coordinate The chromosome in questions total intensity plotted against the position on the chromosome.

*BOTTOM*
Vertical axis: Allelic Imbalance Horizontal axis: Chromosomal coordinate The chromosome in questions allelic imbalance plotted against the position on the chromosome.

//////////////////////////////////////////////
Execution: patchwork.copynumbers()
//////////////////////////////////////////////

The only file you absolutely must have in your working for the next part of execution is copynumbers.Rdata.

Read the outstanding documentation for patchwork.copynumbers()

> ?patchwork.copynumbers

The cliffnotes are that you will need to look at the plots generated by patchwork.plot() to be able to correctly input the arguments for patchwork.copynumbers().

Excerpt from patchwork.copynumbers() documentation:

//////////////////////////////////////////////
Usage:

patchwork.copynumbers(cn2,delta,het,hom,maxCn=8,ceiling=1,forcedelta=F)

Arguments:

name: Default is "copynumbers_". First part of output name for plots generated from patchwork.copynumbers().

cn2: The approximate position of copy number 2,diploid, on total intensity axis.

delta: The difference in total intensity between consecutive copy numbers. For example 1 and 2 or 2 and 3. If copy number 2 has total intensity ~0.6 and copy number 3 har total intensity ~0.8 then delta would be 0.2.

het: Allelic imbalance ratio of heterozygous copy number 2.

hom: Allelic imbalance ratio of Loss-of-heterozygosity copy number 2.

maxCn: Highest copy number to calculate for. Default is 8.

ceiling: Default is 1.

forcedelta: Default is FALSE. If TRUE the delta value will not be subject to small adjustment changes. //////////////////////////////////////////////

On the projects webpage, http://patchwork.r-forge.r-project.org/ (patchwork tab -> execution tab), there will also be a picture accompanying this portion pointing to the argument values should the documentation for the function not be sufficient.

>patchwork.copynumbers(name="Example_",cn2=0.8,delta=0.3,het=0.5,hom=0.8)

patchwork.copynumbers() generates plots and should not take a large amount of time to complete. When it is finished you should have 24 chromosome plots, one for each chromosome, and an overview plot with the approximate positions of copy numbers and allele ratios.

For some details of the plot functions see their documentation:

> ?karyotype_check
> ?karyotype_chromsCN

To interpret the plots here is an excerpt from their documentation:

///////////////////////////////////
karyotype_check()
///////////////////////////////////

Description:

Plots the whole genome coverage vs allelic imbalance with the approximated areas copynumbers and allele constitution tagged.

Details:

Vertical axis: Allelic Imbalance
Horizontal axis: Relative coverage

The naming scheme is Copynumber-m-LesserAlleleDistribution.
For example 2m0 means copynumber = 2, both alleles are the same whereas 2m1 means copynumber = 2, 1 allele each.

Another example: 4m0, copynumber = 4, All allels are the same. (Loss of heterozygosity). 4m1, copynumber = 4, 3 alleles are the same, one is different. 4m2, copynumber = 4, 2 alleles each.

The total number of alleles present are always the copynumber.

///////////////////////////////////
karyotype_chromsCN()
///////////////////////////////////

Description:

Visualises the calculated data of patchwork.plot() + patchwork.copynumbers() for each chromosome. See details for a walkthrough of the plot.

Details:

*TOP*
Vertical axis: Allelic Imbalance Horizontal axis: Total Intensity The chromosome plotted against the complete genome background. The separation between clusters within the plot are due to the fluctuating intensity and allelic imbalance and as such display the varying allele counts and copy numbers. Longer/larger segments have bigger circles. Darker circles show more content as they are ontop of eachother.

*TOP MIDDLE*
Vertical axis: Copynumber Horizontal axis: Chromosomal coordinate Displays the total and minor copynumbers for different segments of the chromosome in question.

*LOWER MIDDLE*
Vertical axis: Total Intensity Horizontal axis: Chromosomal coordinate The chromosome in questions total intensity plotted against the position on the chromosome.

*BOTTOM*
Vertical axis: Allelic Imbalance Horizontal axis: Chromosomal coordinate The chromosome in questions allelic imbalance plotted against the position on the chromosome.

If you have any questions please feel free to contact us and we will help you to the best of our extent!

Author(s)

Sebastian DiLorenzo, [email protected]
Markus Mayrhofer, [email protected]


xu-chang/patchwork-lite documentation built on March 19, 2018, 3:09 a.m.