check_cds: Quality control and preprocessing of coding sequences

View source: R/sequences.R

check_cdsR Documentation

Quality control and preprocessing of coding sequences

Description

check_cds performs comprehensive quality control on coding sequences (CDS) by filtering sequences based on various criteria and optionally removing start or stop codons. This function ensures that sequences meet the requirements for downstream codon usage analysis.

Usage

check_cds(
  seqs,
  codon_table = get_codon_table(),
  min_len = 6,
  check_len = TRUE,
  check_start = TRUE,
  check_stop = TRUE,
  check_istop = TRUE,
  rm_start = TRUE,
  rm_stop = TRUE,
  start_codons = c("ATG")
)

Arguments

seqs

Input CDS sequences as a DNAStringSet or compatible object.

codon_table

Codon table matching the genetic code of the input sequences. Generated using get_codon_table() or create_codon_table().

min_len

Minimum CDS length in nucleotides (default: 6).

check_len

Logical. Check whether CDS length is divisible by 3 (default: TRUE).

check_start

Logical. Check whether CDSs begin with valid start codons (default: TRUE).

check_stop

Logical. Check whether CDSs end with valid stop codons (default: TRUE).

check_istop

Logical. Check for internal stop codons (default: TRUE).

rm_start

Logical. Remove start codons from the sequences (default: TRUE).

rm_stop

Logical. Remove stop codons from the sequences (default: TRUE).

start_codons

Character vector specifying valid start codons (default: "ATG").

Value

A DNAStringSet containing filtered and optionally trimmed CDS sequences that pass all quality control checks.

Examples

# Perform CDS sequence quality control for a sample of yeast genes
s <- head(yeast_cds, 10)
print(s)
check_cds(s)


cubar documentation built on Aug. 21, 2025, 5:40 p.m.