extract_cds_sequences | R Documentation |
Extracts CDS regions from a GTF annotation file or data frame using genomic coordinates and retrieves corresponding DNA sequences from a BSgenome reference.
extract_cds_sequences(input, genome, save_fasta, output_file, verbose)
input |
A character string (GTF file path) or data frame containing CDS annotations. |
genome |
A BSgenome object for the relevant genome. Defaults to human (hg38). |
save_fasta |
A logical indicating whether to save sequences to a FASTA file. Defaults to |
output_file |
A character string specifying the FASTA output path. If |
verbose |
A logical indicating whether to print progress messages. Defaults to |
This function processes CDS entries from the input GTF, extracts their sequences from the reference genome, and optionally saves them in FASTA format. Useful for downstream analyses like protein translation.
A data frame containing CDS annotations with corresponding sequences. If save_fasta = TRUE
, also writes a FASTA file.
file_v1 <- system.file("extdata", "gencode.v1.example.gtf.gz", package = "GencoDymo2")
gtf_v1 <- load_file(file_v1)
# Human CDS extraction
suppressPackageStartupMessages(library(BSgenome.Hsapiens.UCSC.hg38))
suppressPackageStartupMessages(library(GenomicRanges))
gtf_granges <- GRanges(gtf_v1)
cds_seqs <- extract_cds_sequences(gtf_granges, BSgenome.Hsapiens.UCSC.hg38, save_fasta = FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.