get_fragment_size: Gets fragment lengths from a bam file
In ctDNAtools: Analyze Circulating Tumor DNA Sequencing Data

Description Usage Arguments Details Value See Also Examples

A function to extract fragment lengths from a bam file. Optionally, given a mutation data frame, it can categorize read lengths in mutated vs non-mutated reads.

get_fragment_size(bam, mutations = NULL, targets = NULL, tag = "",
  isProperPair = NA, mapqFilter = 30, min_size = 1, max_size = 400,
  ignore_trimmed = TRUE, different_strands = TRUE,
  simple_cigar = FALSE)

`bam`	path to bam file.
`mutations`	An optional data frame with mutations. Must have the columns CHROM, POS, REF, ALT.
`targets`	a data frame with the target regions to restrict the reads in the bam. Must have three columns: chr, start and end
`tag`	the RG tag if the bam has more than one sample.
`isProperPair`	a logical whether to return only proper pairs (true), only improper pairs (false), or it does not matter (NA).
`mapqFilter`	mapping quality threshold for considering reads.
`min_size`	Integer with the lowest fragment length.
`max_size`	Integer with the highest fragment length.
`ignore_trimmed`	logical, whether to remove reads that have been hard trimmed.
`different_strands`	logical, whether to keep only reads whose mates map to different strand.
`simple_cigar`	logical, whether to include only reads with simple cigar.

Extracts the fragment size of reads in the input bam that satisfy the following conditions:

Paired, and optionally properly paired depending on the isProperPair parameter.
Both the reads and mate are mapped.
Not secondary or supplementary alignment
Not duplicate
Passing quality control
Read and mate on different strands (optional depending on the different_strands parameter)
Not trimmed (optional depending on the ignore_trimmed parameter), i.e. will keep only reads with the max length.
Having a simple cigar (optional depending on the simple_cigar parameter).
Satisfy the mapping quality threshold specified in the mapqFilter parameter.
Reads and mates on the same chromosome when min_size > 0.

When the input mutations is given, the output will label the reads that support the variant alleles of the mutation in a separate column.

A data frame with the columns:

Sample: The SM tag in bam or file name
ID: the read ID
chr: chromosome
start: the left most end of either the read or mate
end: the right most end of either the read or mate.
size: the fragment size
category (only if mutations is provided): either ref, alt, or other

summarize_fragment_size bin_fragment_size analyze_fragmentation get_mutations_fragment_size

data("mutations", package = "ctDNAtools")
data("targets", package = "ctDNAtools")
bamT1 <- system.file("extdata", "T1.bam", package = "ctDNAtools")

## basic usage
fs <- get_fragment_size(bam = bamT1)

## More options
fs1 <- get_fragment_size(
  bam = bamT1, isProperPair = TRUE, min_size = 70,
  max_size = 200, ignore_trimmed = FALSE, different_strands = FALSE,
  simple_cigar = TRUE
)

## with mutations input
fs2 <- get_fragment_size(bam = bamT1, mutations = mutations)

## using targets
fs3 <- get_fragment_size(bam = bamT1, targets = targets)