Non-standard genetic code

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Introduction

cubar supports the codon usage bias analysis of sequences utilizing non-standard genetic codes, such as those found in mitochondrial or chloroplast protein-coding sequences. To illustrate its application, we demonstrate the calculation of effective number of codons (ENC) for human mitochondrial CDS sequences.

suppressPackageStartupMessages(library(Biostrings))
library(cubar)

Main analysis

First, Load sequences and get the corresponding codon table.

human_mt

ctab <- get_codon_table(gcid = '2')
head(ctab)

We do not check CDS length and stop codons as incomplete stop codons are prevalent among MT CDSs.

human_mt_qc <- check_cds(
    human_mt,
    codon_table = ctab,
    check_stop = FALSE,
    rm_stop = FALSE,
    check_len = FALSE,
    start_codons = c('ATG', 'ATA', 'ATT'))

human_mt_qc

As stop codons are present, now we manually remove them.

len_trim <- width(human_mt_qc) %% 3
len_trim <- ifelse(len_trim == 0, 3, len_trim)
human_mt_qc <- subseq(human_mt_qc, start = 1, end = width(human_mt_qc) - len_trim)

human_mt_qc

Finally, we calculate codon frequencies and ENC.

# calculate codon frequency
mt_cf <- count_codons(human_mt_qc)

# calculate ENC
get_enc(mt_cf, codon_table = ctab)

It is important to note that the check_cds function and stop codon trimming are optional steps, and you can implement your own quality control procedures. However, it is crucial to ensure that your input sequences are suitable for codon usage bias analysis. Failure to do so may lead to ambiguous and misleading results from problematic sequences.



Try the cubar package in your browser

Any scripts or data that you put into this service are public.

cubar documentation built on April 3, 2025, 8:58 p.m.