View source: R/circularity_test.R
circularity_test | R Documentation |
A rough function to assess the circularity of a genetic sequence, for example, an assembly of a bacterial chromosome or a eukaryotic organelle genome.
circularity_test(
query_seq,
word_size = 20,
search_start = 1,
step_size = 1,
wiggle = 2
)
query_seq |
Character: The query sequence. |
word_size |
Integer: The size of the words to search for. Default = 20. |
search_start |
Integer: The starting base position. Default = 1. |
step_size |
Integer: The sliding window step size. Default = 1. |
wiggle |
Integer: The "wiggle room" for the word size in case the sliding window over-shoots the end of the sequence. Default = 2. |
A sliding window is used to assess for the presence of a replicated character string in the query sequence. This replicated pattern is used to infer the circular start and end points.
The function circle_cutter
can then be used to excise the single
non-replicated linear sequence.
If the function finds a replicated character string, it will return a data.table with the columns:
$STEP
: Integer, the step number in the sliding window.
$WORD
: Character, the character string.
$START
: Integer, the start position.
$END
: Integer, the end position.
$SIZE
: Integer, the word size.
x <- 'AATTGGCCACTATCTGCTAGCTAGCATAGCATCGATCAGCATGACGCGCAAAATTGGCC'
# Find character motif that is repeated
motif_hits <- circularity_test(x, word_size = 8)
motif_seq <- substr(x, motif_hits[1,1], motif_hits[1,2])
circle_cutter(query_seq = x, motif=motif_seq)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.