paf2nucleotideContent: Add FASTA sequence content to PAF alignments.

View source: R/getSequenceContent.R

paf2nucleotideContentR Documentation

Add FASTA sequence content to PAF alignments.

Description

This function takes a PAF table and for each alignment (rows) will report counts and frequencies of user defined ‘sequence.pattern' (such as exact DNA pattern, e.g. ’GA') or 'nucleotide.content' (such as sequence GC content).

Usage

paf2nucleotideContent(
  paf.table = NULL,
  asm.fasta = NULL,
  alignment.space = NULL,
  sequence.pattern = NULL,
  nucleotide.content = NULL
)

Arguments

paf.table

A data.frame or tibble containing a single or multiple PAF record(s) with 12 mandatory columns along with CIGAR string defined in 'cg' column.

asm.fasta

An assembly FASTA file to extract DNA sequence from defined PAF alignments.

alignment.space

What alignment coordinates should be exported as FASTA, either 'query' or 'target' (Default : 'query').

sequence.pattern

A user defined DNA sequence pattern which occurrences will be counted in submitted 'fasta.seq'. (e.g. set ‘sequence.pattern' to ’GA' to obtain counts of all 'GA' occurrences per FASTA sequence).

nucleotide.content

A user defined nucleotides which total content will be counted in submitted 'fasta.seq'. (e.g. to obtain 'GC' content set ‘nucleotide.content' to ’GC')

Author(s)

David Porubsky

Examples

## Get PAF to process ##
paf.file <- system.file("extdata", "test_getFASTA.paf", package = "SVbyEye")
## Read in PAF
paf.table <- readPaf(paf.file = paf.file, include.paf.tags = TRUE, restrict.paf.tags = "cg")
## Split PAF alignments into user defined bins
paf.table <- pafToBins(paf.table = paf.table, binsize = 10000)
## Get FASTA to process
asm.fasta <- system.file("extdata", "test_getFASTA_query.fasta", package = "SVbyEye")
## Add sequence and nucleotide content to submitted paf.table
paf2nucleotideContent(
    paf.table = paf.table, asm.fasta = asm.fasta,
    alignment.space = "query", sequence.pattern = "GA"
)


daewoooo/SVbyEye documentation built on March 31, 2024, 8:58 a.m.