Description Usage Arguments Value
fastaLabelGenerator
Iterates over folder containing .fasta files and produces one-hot-encoding of predictor sequences
and target variables. Targets will be read from fasta headers.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | fastaLabelGenerator(
corpus.dir,
format = "fasta",
batch.size = 256,
maxlen = 250,
max_iter = 10000,
vocabulary = c("a", "c", "g", "t"),
verbose = FALSE,
randomFiles = FALSE,
step = 1,
showWarnings = FALSE,
seed = 1234,
shuffleFastaEntries = FALSE,
numberOfFiles = NULL,
fileLog = NULL,
labelVocabulary = c("x", "y", "z"),
reverseComplements = TRUE
)
|
corpus.dir |
Input directory where .fasta files are located or path to single file ending with .fasta or .fastq (as specified in format argument). |
format |
File format, either fasta or fastq. |
batch.size |
Number of batches. |
maxlen |
Length of predictor sequence. |
max_iter |
Stop after max_iter number of iterations failed to produce a new batch. |
vocabulary |
Vector of allowed characters, character outside vocabulary get encoded as 0-vector. |
verbose |
Whether to show message. |
randomFiles |
Logical, whether to go through files randomly or sequential. |
step |
How often to take a sample. |
showWarnings |
Logical, give warning if character outside vocabulary appears. |
seed |
Sets seed for set.seed function, for reproducible results when using |
shuffleFastaEntries |
Logical, shuffle fasta entries. |
numberOfFiles |
Use only specified number of files, ignored if greater than number of files in corpus.dir. |
fileLog |
Write name of files to csv file if path is specified. |
labelVocabulary |
Character vector of possible targets. Targets outside |
reverseComplements |
Logical, half of batch contains sequences and other its reverse complements. Reverse complement
is given by reversed order of sequence and switching A/T and C/G. |
A list of length 2. First element is a 3-dimensional tensor with dimensions (batch.size, maxlen, length(vocabulary)), encoding the predictor sequences. Second element is a matrix with dimensions (batch.size, length(vocabulary)), encoding the targets.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.