n_gram_dist | R Documentation |
Get distribution of next character given previous n nucleotides.
n_gram_dist(
path_input,
n = 2,
vocabulary = c("A", "C", "G", "T"),
format = "fasta",
file_sample = NULL,
step = 1,
nuc_dist = FALSE
)
path_input |
Path to folder containing fasta files or single fasta file. |
n |
Size of n gram. |
vocabulary |
Vector of allowed characters, samples outside vocabulary get discarded. |
format |
File format, either |
file_sample |
If integer, size of random sample of files in |
step |
How often to take a sample. |
nuc_dist |
Nucleotide distribution. |
Returns a matrix with distributions of nucleotides given the previous n nucleotides.
A data frame of n-gram predictions.
temp_dir <- tempfile()
dir.create(temp_dir)
create_dummy_data(file_path = temp_dir,
num_files = 3,
seq_length = 80,
vocabulary = c("A", "C", "G", "T"),
num_seq = 2)
m <- n_gram_dist(path_input = temp_dir,
n = 3,
step = 1,
nuc_dist = FALSE)
head(round(m, 2))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.