bedpeToRearrCatalogue: bedpeToRearrCatalogue

View source: R/bedpeToRearrCatalogue.R

bedpeToRearrCatalogueR Documentation

bedpeToRearrCatalogue

Description

This function converts a data frame BEDPE into a rearrangement catalogue, you should pass rearrangements of only one sample, and one rearrangement for each paired-end mates. The BEDPE data fram should contain the following columns: "chrom1", "start1", "end1", "chrom2", "start2", "end2" and "sample" (sample name). In addition, either two columns indicating the strands of the mates, "strand1" (+ or -) and "strand2" (+ or -), or one column indicating the structural variant class, "svclass": translocation, inversion, deletion, tandem-duplication. If you specify the "svclass" column, then the "strand1" and "strand2" columns will be ignored. If the "svclass" column is absent, then it will be created using the convention of BrassII from the Sanger Institute pipeline: inversion when strand1 and strand2 are different, deletion when strand1 and strand2 are both +, tandem-duplication when strand1 and strand2 are both -, and translocation when strand1 and strand2 are on different chromosomes.

Usage

bedpeToRearrCatalogue(sv_bedpe, kmin = 10, PEAK.FACTOR = 10)

Arguments

sv_bedpe

data frame BEDPE as described above

kmin

minimum number of break points in a segment to consider it a cluster. Default is 10.

PEAK.FACTOR

this factor is used to calculate a threshold for the minimum average distance of breakpoints in a cluster. The threshold is given by the expected distance divided by the PEAK.FACTOR. In turn, the expected distance is the number of base pairs in a genome divided by the total number of break points. Default is 10.

Details

Please notice that the interpretation of strand1 and strand2 from your rearrangement caller may differ from the BrassII Sanger interpretation. Typically, other callers may have the strand2 sign inverted with respect to what is shown by BrassII, so for example a deletion is when strand1 is + and strand2 is -, instead of when both are + as in our case. To avoid confusion, double check the convention of your caller, and possibly specify the svclass column yourself to simply ignore the strand1 and strand2 automated interpretation.

Optionally, the user can provide in the bedpe two additional columns, "non-template" and "micro-homology", which should contain the DNA sequence inserted ("non-template") or deleted ("micro-homology") at the breakpoints junction. A dot (".") should be inserted in these columns if a DNA sequence is not available. When these two columns are available, a junctions catalogue will be computed and returned. A junction catalogue contains the counts of how many clustered/unclustered rearrangements have non-templated insertions or micro-homology deletions of a certain size.

Value

returns a list with the rearrangement catalogue (rearr_catalogue) for the given sample and the annotated bedpe (annotated_bedpe) for the given sample. Also, a junctions catalogue will be returned if the non-template and micro-homology columns are provided. If clusters of rearrangements are found then the clustering regions will also be returned (clustering_regions).

Examples

vcf_sv_file.bedpe <- "sample.bedpe"
sv_bedpe <- read.table(vcf_sv_file.bedpe,sep = "\t",header = TRUE,
                     stringsAsFactors = FALSE,check.names = FALSE)
#build a catalogue from the bedpe file
res <- bedpeToRearrCatalogue(sv_bedpe)
plotRearrSignatures(res$rearr_catalogue)

Nik-Zainal-Group/signature.tools.lib documentation built on April 13, 2025, 5:50 p.m.