permutate_seq: Create same length sequence permutations for each letter

View source: R/general_utils.R

permutate_seqR Documentation

Create same length sequence permutations for each letter

Description

Given a set of letters in an alphabet return a dataframe with all the possible permutations of those letters.#'

Usage

permutate_seq(
  sequence_length,
  alphabet = c("A", "C", "G", "T"),
  k = 5,
  verbose = FALSE,
  scramble = FALSE,
  seed = NULL
)

Arguments

sequence_length

How long the sequences should be (keep this below 40)

alphabet

Letters to permutate in order to build different sequences

k

Pretty print option: show the first k rows per letter of the alphabet

verbose

Show a pretty print summary data frame. Defatult FALSE.

scramble

Randomize the order of the rows.

seed

Integer passed to set.seed() just before data is scrambled to get reproducible results. Default is NULL.

Details

This is a handy wrapper of gtools::permutation() function to generate saturated permutation sequences of DNA. The number of permutations increases rapidly with the length of the alphabet and sequence_length!, I made a pretty print function that can be used to slice the permutation sequences and check the results. To show the pretty print data frame use verbose = TRUE.

By default the sequences are returned in alphabetical order. With scramble = TRUE one can reshuffle the sequences in a random order.

Value

A data frame

See Also

print_seq_perm

Examples

dat <- permutate_seq(sequence_length = 5, k = 3, verbose = T)
dat2 <- permutate_seq(sequence_length = 3, alphabet = c("W", "*", "X", "!", "%", "7"), k = 7, verbose = T)

# To make many sequencing barcodes that follow this patter: 
# 'NNNN', 'AGCT', 'NNNN', 'TCAG', 'NNNN', 'TAGC', 'NNN', 'CAGT', 'NNN'

barcodes <- list()
for (i in 1:100) {
  tmp <- cbind( permutate_seq(sequence_length = 4, scramble = T), 'AGCT', 
                permutate_seq(sequence_length = 4, scramble = T), 'TCAG', 
                permutate_seq(sequence_length = 4, scramble = T), 'TAGC', 
                permutate_seq(sequence_length = 3, scramble = T), 'CAGT', 
                permutate_seq(sequence_length = 3, scramble = T) ) 
  # concatenate into one single sequence
  tmp <- apply(tmp, 1, paste0, collapse = "") |> data.frame() |> setNames('BC')
  
  barcodes[[i]] <- tmp
}

do.call('rbind', barcodes) |> unique() |> nrow()
# 25600 ( (N^4) * 100, where N = 4 )


Ni-Ar/niar documentation built on Feb. 3, 2025, 9:25 a.m.