| gsynth.random | R Documentation |
Generates random DNA sequences based on nucleotide probabilities without using a trained Markov model. Each nucleotide is sampled independently according to the specified probabilities.
gsynth.random(
intervals = NULL,
output_path = NULL,
output_format = c("misha", "fasta", "vector"),
nuc_probs = c(A = 0.25, C = 0.25, G = 0.25, T = 0.25),
mask_copy = NULL,
seed = NULL,
n_samples = 1,
iterator = 1
)
intervals |
Genomic intervals to sample. If NULL, uses all chromosomes. |
output_path |
Path to the output file (ignored when output_format = "vector") |
output_format |
Output format:
|
nuc_probs |
Nucleotide probabilities. Can be specified as:
Probabilities are automatically normalized to sum to 1. Default is uniform (0.25 each). |
mask_copy |
Optional intervals to copy from the original genome instead of random sampling. Use this to preserve specific regions exactly as they appear in the reference. |
seed |
Random seed for reproducibility. If NULL, uses current random state. |
n_samples |
Number of samples to generate per interval. Default is 1. |
iterator |
Iterator for position resolution. Default is 1 (base-pair resolution). Larger values may speed up processing but are typically not needed for random sampling. |
Unlike gsynth.sample which uses a trained Markov model to generate
sequences that preserve k-mer statistics, gsynth.random generates purely
random sequences where each nucleotide is sampled independently. This is useful
for generating baseline random sequences or sequences with specific GC content.
Nucleotide ordering: When using an unnamed vector for nuc_probs,
the order is A, C, G, T. Named vectors can be in any order.
When output_format is "misha" or "fasta", returns invisible NULL and writes the random sequences to output_path. When output_format is "vector", returns a character vector of sequences (length = n_intervals * n_samples).
gsynth.sample, gsynth.train
gdb.init_examples()
# Generate random sequences with uniform nucleotide probabilities
seqs <- gsynth.random(
intervals = gintervals(1, 0, 1000),
output_format = "vector",
seed = 42
)
# Generate GC-rich sequences (60% GC)
gc_rich <- gsynth.random(
intervals = gintervals(1, 0, 1000),
output_format = "vector",
nuc_probs = c(A = 0.2, C = 0.3, G = 0.3, T = 0.2),
seed = 42
)
# Generate AT-rich sequences
at_rich <- gsynth.random(
intervals = gintervals(1, 0, 1000),
output_format = "vector",
nuc_probs = c(A = 0.35, C = 0.15, G = 0.15, T = 0.35),
seed = 42
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.