extract_single_exon | R Documentation |
This function identifies single-exon genes or transcripts from a genomic annotation dataset, typically obtained from the GENCODE database. It processes input data, either from a file or an already loaded data frame, and returns either single-exon genes or single-exon transcripts based on the user's selection. The function works seamlessly with GTF/GFF files or data frames that include the standard GENCODE annotation fields. This is especially useful for identifying genes or transcripts that do not undergo splicing.
extract_single_exon(input, level = "gene", output_file = NULL)
input |
Either a file path to a GTF/GFF3 file or a data frame representing genomic annotations. The input data should contain at least the columns |
level |
A character string specifying the level of analysis. It should be either |
output_file |
Optional. A file path to save the results as a tab-separated file. If not provided, the results will not be saved. The file will include the IDs of the single-exon genes or transcripts along with their respective exon IDs. This feature is useful for exporting results for further analysis or reporting purposes. |
Input Validation: Checks for required columns and valid level
specification.
Exon Counting: Groups exons by gene/transcript and counts occurrences.
Single-Exon Filtering: Retains only entities with exactly one exon.
Output: Returns filtered results; optionally writes to file.
A data frame containing the IDs of single-exon genes or transcripts based on the selected level. The returned data frame contains the following columns:
gene_id
or transcript_id
: The IDs of the single-exon genes or transcripts.
exon_count
: The count of exons for each entity (always 1 for single-exon entities).
exon_id
: The ID of the exon associated with the single-exon gene or transcript.
The data frame can then be used for downstream analysis, such as identifying unique single-exon genes or transcripts, or for exporting to a file.
extract_introns
, load_file
# Example input data frame
input <- data.frame(
type = c("exon", "exon", "exon", "exon", "exon"),
gene_id = c("gene1", "gene1", "gene2", "gene3", "gene4"),
transcript_id = c("tx1", "tx1", "tx2", "tx3", "tx4"),
exon_number = c(1, 2, 1, 1, 1),
exon_id = c("e1", "e2", "e1", "e1", "e1")
)
# Identify single-exon genes
single_exon_genes <- extract_single_exon(input, level = "gene")
print(single_exon_genes)
# Identify single-exon transcripts
single_exon_transcripts <- extract_single_exon(input, level = "transcript")
print(single_exon_transcripts)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.