View source: R/bedpeToRearrCatalogue.R
bedpeToRearrCatalogue | R Documentation |
This function converts a data frame BEDPE into a rearrangement catalogue, you should pass rearrangements of only one sample, and one rearrangement for each paired-end mates. The BEDPE data fram should contain the following columns: "chrom1", "start1", "end1", "chrom2", "start2", "end2" and "sample" (sample name). In addition, either two columns indicating the strands of the mates, "strand1" (+ or -) and "strand2" (+ or -), or one column indicating the structural variant class, "svclass": translocation, inversion, deletion, tandem-duplication. If you specify the "svclass" column, then the "strand1" and "strand2" columns will be ignored. If the "svclass" column is absent, then it will be created using the convention of BrassII from the Sanger Institute pipeline: inversion when strand1 and strand2 are different, deletion when strand1 and strand2 are both +, tandem-duplication when strand1 and strand2 are both -, and translocation when strand1 and strand2 are on different chromosomes.
bedpeToRearrCatalogue(sv_bedpe, kmin = 10, PEAK.FACTOR = 10)
sv_bedpe |
data frame BEDPE as described above |
kmin |
minimum number of break points in a segment to consider it a cluster. Default is 10. |
PEAK.FACTOR |
this factor is used to calculate a threshold for the minimum average distance of breakpoints in a cluster. The threshold is given by the expected distance divided by the PEAK.FACTOR. In turn, the expected distance is the number of base pairs in a genome divided by the total number of break points. Default is 10. |
Please notice that the interpretation of strand1 and strand2 from your rearrangement caller may differ from the BrassII Sanger interpretation. Typically, other callers may have the strand2 sign inverted with respect to what is shown by BrassII, so for example a deletion is when strand1 is + and strand2 is -, instead of when both are + as in our case. To avoid confusion, double check the convention of your caller, and possibly specify the svclass column yourself to simply ignore the strand1 and strand2 automated interpretation.
Optionally, the user can provide in the bedpe two additional columns, "non-template" and "micro-homology", which should contain the DNA sequence inserted ("non-template") or deleted ("micro-homology") at the breakpoints junction. A dot (".") should be inserted in these columns if a DNA sequence is not available. When these two columns are available, a junctions catalogue will be computed and returned. A junction catalogue contains the counts of how many clustered/unclustered rearrangements have non-templated insertions or micro-homology deletions of a certain size.
returns a list with the rearrangement catalogue (rearr_catalogue) for the given sample and the annotated bedpe (annotated_bedpe) for the given sample. Also, a junctions catalogue will be returned if the non-template and micro-homology columns are provided. If clusters of rearrangements are found then the clustering regions will also be returned (clustering_regions).
vcf_sv_file.bedpe <- "sample.bedpe"
sv_bedpe <- read.table(vcf_sv_file.bedpe,sep = "\t",header = TRUE,
stringsAsFactors = FALSE,check.names = FALSE)
#build a catalogue from the bedpe file
res <- bedpeToRearrCatalogue(sv_bedpe)
plotRearrSignatures(res$rearr_catalogue)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.