extract_cds_sequences: Extract Coding Sequences (CDS) from GTF Annotations

extract_cds_sequencesR Documentation

Extract Coding Sequences (CDS) from GTF Annotations

Description

Extracts CDS regions from a GTF annotation file or data frame using genomic coordinates and retrieves corresponding DNA sequences from a BSgenome reference.

Usage

extract_cds_sequences(input, genome, save_fasta, output_file, verbose)

Arguments

input

A character string (GTF file path) or data frame containing CDS annotations.

genome

A BSgenome object for the relevant genome. Defaults to human (hg38).

save_fasta

A logical indicating whether to save sequences to a FASTA file. Defaults to FALSE.

output_file

A character string specifying the FASTA output path. If NULL, uses "CDS.fa".

verbose

A logical indicating whether to print progress messages. Defaults to TRUE.

Details

This function processes CDS entries from the input GTF, extracts their sequences from the reference genome, and optionally saves them in FASTA format. Useful for downstream analyses like protein translation.

Value

A data frame containing CDS annotations with corresponding sequences. If save_fasta = TRUE, also writes a FASTA file.

Examples

file_v1 <- system.file("extdata", "gencode.v1.example.gtf.gz", package = "GencoDymo2")
gtf_v1 <- load_file(file_v1)
# Human CDS extraction
suppressPackageStartupMessages(library(BSgenome.Hsapiens.UCSC.hg38))
suppressPackageStartupMessages(library(GenomicRanges))
gtf_granges <- GRanges(gtf_v1)
cds_seqs <- extract_cds_sequences(gtf_granges, BSgenome.Hsapiens.UCSC.hg38, save_fasta = FALSE)


GencoDymo2 documentation built on June 8, 2025, 10:29 a.m.