Depth and number of neurons per layer of the netwok can be specified. First layer can be a Convolutional Neural Network (CNN) that is designed to capture codons.
If a path to a folder where FASTA files are located is provided, batches will ge generated using an external generator which
is recommended for big training sets. Alternative, a dataset can be supplied that holds the preprocessed batches (generated by preprocessSemiRedundant())
and keeps them in RAM. Supports also training on instances with multiple GPUs and scales linear with number of GPUs present.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | trainNetwork(
train_type = "lm",
model_path = NULL,
model = NULL,
path = NULL,
path.val = NULL,
dataset = NULL,
checkpoint_path,
validation.split = 0.2,
run.name = "run",
batch.size = 64,
epochs = 10,
max.queue.size = 100,
lr.plateau.factor = 0.9,
patience = 5,
cooldown = 5,
steps.per.epoch = 1000,
step = 1,
randomFiles = FALSE,
initial_epoch = NULL,
vocabulary = c("a", "c", "g", "t"),
tensorboard.log,
save_best_only = TRUE,
compile = TRUE,
learning.rate = NULL,
solver = NULL,
max_iter = 1000,
seed = c(1234, 4321),
shuffleFastaEntries = FALSE,
output = list(none = FALSE, checkpoints = TRUE, tensorboard = TRUE, log = TRUE,
serialize_model = TRUE, full_model = TRUE),
format = "fasta",
fileLog = NULL,
labelVocabulary = NULL,
numberOfFiles = NULL,
reverseComplements = FALSE
)
|
train_type |
Either "lm" for language model, "label_header" or "label_folder". Language model is trained to predict next character in sequence. label_header/label_folder are trained to predict a corresponding class, given a sequence as input. If "label_header", class will be read from fasta headers. If "label_folder", class will be read from folder, i.e. all fasta files in one folder must belong to the same class. mailab |
model_path |
Path to a pretrained model. |
model |
A keras model. |
path |
Path to folder where individual or multiple FASTA files are located for training. If |
path.val |
Path to folder where individual or multiple FASTA files are located for validation.If |
dataset |
Dataframe holding training samples in RAM instead of using generator. |
checkpoint_path |
Path to checkpoints folder. |
validation.split |
Defines the fraction of the batches that will be used for validation (compared to size of training data). |
run.name |
Name of the run (without file ending). Name will be used to identify output from callbacks. |
batch.size |
Number of samples that are used for one network update. |
epochs |
Number of iterations. |
max.queue.size |
Queue on fit_generator(). |
lr.plateau.factor |
Factor of decreasing learning rate when plateau is reached. |
patience |
Number of epochs waiting for decrease in loss before reducing learning rate. |
cooldown |
Number of epochs without changing learning rate. |
steps.per.epoch |
Number of batches to finish one epoch. |
step |
Frequency of sampling steps. |
randomFiles |
TRUE/FALSE go through files sequentially or shuffle beforehand. |
initial_epoch |
Epoch at which to start training, set to 0 if no |
vocabulary |
Vector of allowed characters, character outside vocabulary get encoded as 0-vector. |
tensorboard.log |
Path to tensorboard log directory. |
save_best_only |
Only save model that improved on best val_loss score. |
compile |
Whether to compile the model after loading. |
learning.rate |
Learning rate for optimizer. Only used when pretrained model is given ( |
solver |
Optimization method, options are "adam", "adagrad", "rmsprop" or "sgd". Only used when pretrained model is given ( |
max_iter |
Stop after max_iter number of iterations failed to produce new sample. |
seed |
Sets seed for set.seed function, for reproducible results when using |
shuffleFastaEntries |
Logical, shuffle entries in file. |
output |
List of optional outputs, no output if none is TRUE. |
format |
File format, "fasta" or "fastq". |
fileLog |
Write name of files to csv file if path is specified. |
labelVocabulary |
Character vector of possible targets. Targets outside |
numberOfFiles |
Use only specified number of files, ignored if greater than number of files in corpus.dir. |
reverseComplements |
Logical, half of batch contains sequences and other its reverse complements. Reverse complement
is given by reversed order of sequence and switching A/T and C/G. |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.