Description Usage Arguments Value
labelByFolderGenerator
Iterates over folder containing .fasta files and produces one-hot-encoding of predictor sequences
and target variables. Files in corpus.dir
should all belong to one class.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | labelByFolderGenerator(
corpus.dir,
format = "fasta",
batch.size = 256,
maxlen = 250,
max_iter = 10000,
vocabulary = c("a", "c", "g", "t"),
verbose = FALSE,
randomFiles = FALSE,
step = 1,
showWarnings = FALSE,
seed = 1234,
shuffleFastaEntries = FALSE,
numberOfFiles = NULL,
fileLog = NULL,
reverseComplements = TRUE,
numTargets,
onesColumn
)
|
corpus.dir |
Input directory where .fasta files are located or path to single file ending with .fasta or .fastq (as specified in format argument). |
format |
File format, either fasta or fastq. |
batch.size |
Number of batches. |
maxlen |
Length of predictor sequence. |
max_iter |
Stop after max_iter number of iterations failed to produce a new batch. |
vocabulary |
Vector of allowed characters, character outside vocabulary get encoded as 0-vector. |
verbose |
Whether to show message. |
randomFiles |
Logical, whether to go through files randomly or sequential. |
step |
How often to take a sample. |
showWarnings |
Logical, give warning if character outside vocabulary appears |
seed |
Sets seed for set.seed function, for reproducible results when using |
shuffleFastaEntries |
Logical, shuffle fasta entries. |
numberOfFiles |
Use only specified number of files, ignored if greater than number of files in corpus.dir. |
fileLog |
Write name of files to csv file if path is specified. |
reverseComplements |
Logical, half of batch contains sequences and other its reverse complements. Reverse complement
is given by reversed order of sequence and switching A/T and C/G. |
numTargets |
Number of columns of target matrix. |
onesColumn |
Which column of target matrix contains ones |
A list of length 2. First element is a 3-dimensional tensor with dimensions (batch.size, maxlen, length(vocabulary)), encoding the predictor sequences. Second element is a matrix with dimensions (batch.size, length(vocabulary)), encoding the targets.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.