circularity_test: Test the circularity of a genetic sequence

View source: R/circularity_test.R

circularity_testR Documentation

Test the circularity of a genetic sequence

Description

A rough function to assess the circularity of a genetic sequence, for example, an assembly of a bacterial chromosome or a eukaryotic organelle genome.

Usage

circularity_test(
  query_seq,
  word_size = 20,
  search_start = 1,
  step_size = 1,
  wiggle = 2
)

Arguments

query_seq

Character: The query sequence.

word_size

Integer: The size of the words to search for. Default = 20.

search_start

Integer: The starting base position. Default = 1.

step_size

Integer: The sliding window step size. Default = 1.

wiggle

Integer: The "wiggle room" for the word size in case the sliding window over-shoots the end of the sequence. Default = 2.

Details

A sliding window is used to assess for the presence of a replicated character string in the query sequence. This replicated pattern is used to infer the circular start and end points.

The function circle_cutter can then be used to excise the single non-replicated linear sequence.

Value

If the function finds a replicated character string, it will return a data.table with the columns:

  1. $STEP: Integer, the step number in the sliding window.

  2. $WORD: Character, the character string.

  3. $START: Integer, the start position.

  4. $END: Integer, the end position.

  5. $SIZE: Integer, the word size.

Examples

x <- 'AATTGGCCACTATCTGCTAGCTAGCATAGCATCGATCAGCATGACGCGCAAAATTGGCC'

# Find character motif that is repeated
motif_hits <- circularity_test(x, word_size = 8)
motif_seq <- substr(x, motif_hits[1,1], motif_hits[1,2])

circle_cutter(query_seq = x, motif=motif_seq)


j-a-thia/genomalicious documentation built on Oct. 19, 2024, 7:51 p.m.