regions2FASTA: Export FASTA sequences from a set of Genomic Ranges.

View source: R/regionsToFasta.R

regions2FASTAR Documentation

Export FASTA sequences from a set of Genomic Ranges.

Description

This function takes a GRanges-class object and extracts a genomic sequence from these regions either from an original range or a range expanded on each side by defined number of bases.

Usage

regions2FASTA(
  gr,
  bsgenome = NULL,
  asm.fasta = NULL,
  index.field = NULL,
  expand = 0,
  fasta.save = NULL
)

Arguments

gr

A GRanges-class object of genomic regions to extract genomic sequence from.

bsgenome

A BSgenome-class object of reference genome to get the genomic sequence from.

asm.fasta

An assembly FASTA file to extract DNA sequence from defined PAF alignments.

index.field

A user defined column number to be used as a sequence ID for the exported FASTA.

expand

Expand edges of each genomic region by this length (in bp).

fasta.save

A path to a filename where to store final FASTA file.

Value

A DNAStringSet-class object

Author(s)

David Porubsky

Examples

## Get PAF to process ##
paf.file <- system.file("extdata", "test_getFASTA.paf", package = "SVbyEye")
## Read in PAF
paf.table <- readPaf(paf.file = paf.file, include.paf.tags = TRUE, restrict.paf.tags = "cg")
## Get FASTA sequence from a defined genomic region ##
query.fasta <- system.file("extdata", "test_getFASTA_query.fasta", package = "SVbyEye")
## Define genomic ranges to export FASTA sequence from
query.gr <- GenomicRanges::GRanges(
    seqnames = unique(paf.table$q.name),
    ranges = IRanges::IRanges(
        start = paf.table$q.start,
        end = paf.table$q.end
    )
)
## Extract FASTA
regions2FASTA(gr = query.gr, asm.fasta = query.fasta)
## Get FASTA sequence around breakpoints of alignment indels ##
## Break PAF alignment at indels of 10 kbp and longer
paf.table <- breakPaf(paf.table = paf.table, min.deletion.size = 10000, min.insertion.size = 10000)
paf.svs <- paf.table$SVs
## Define breakpoint positions of detected indels
sv.bps.gr <- GenomicRanges::GRanges(
    seqnames = unique(paf.svs$q.name),
    ranges = IRanges::IRanges(
        start = paf.svs$q.start,
        end = paf.svs$q.start
    ),
    bp.index = paste0(paf.svs$q.name, "-left_bp-", 1:nrow(paf.svs))
)
## Extract FASTA
regions2FASTA(gr = sv.bps.gr, asm.fasta = query.fasta, expand = 10, index = 1)


daewoooo/SVbyEye documentation built on Oct. 15, 2024, 6:12 a.m.