extract_fa: Retrieve descriptors from fasta file

Description Usage Arguments Value

View source: R/get_descriptors.R

Description

extract_fa retrieves from fasta_filename a set of descriptors for the C:G > T:A variants defined in mutation_ids. These descriptors are relevant to their classification into deaminations or non-deaminations.

Usage

1
extract_fa(mutation_ids, fasta_filename, k = 2)

Arguments

mutation_ids

character vector containing the ids of the loci to get the descriptors of. Id format is CHR:POS.

fasta_filename

character string naming the path to the reference genome FASTA file the sequencing data was aligned to.

k

integer with the number of bases to the right and to the left of the loci to get the genomic sequence from.

Value

Tibble containing a set of descriptors related to the genomic base sequence ranging from locus - k to locus + k for each locus in mutation_ids. These descriptors, divided in columns, are: base two positions before, base one position before, base two positions after, base one position after, dinucleotide before and dinucleotide after.


mmaitenat/ideafix documentation built on Sept. 18, 2021, 7:55 a.m.