deepG is a package for generating LSTM models from genomic text and provides scripts for various common tasks such as the extraction of cell response. It also comes with example datasets of genomic and human-readable languages for testing.
Please see our Wiki for further installation instructions. It covers also usage instructions for multi-GPU machines.
See the help files ?deepG
to get started and for questions use the FAQ.
The library comes with mutiple different datasets for testing:
data(parenthesis)
contains 100k characters of the parenthesis synthetic language generated from a very simple counting language with a parenthesis and letter alphabet Σ = {( ) 0 1 2 3 4 }. The language is constrained to match parentheses, and nesting is limited to at most 4 levels deep. Each opening parenthesis increases and each closing parenthesis decreases the nesting level, respectively. Numbers are generated randomly, but are constrained to indicate the nesting level at their position.data(crispr_full)
containing all CRISPR loci found in NCBI representative genomes with neighbor nucleotides up and downstream.data(crispr_sample)
containing a subset of data(crispr_full)
.data(ecoli)
contains the E. coli genome, see the genome sequence of Escherichia coli K-12.data(ecoli_small)
contains a subset of data(ecoli)
.library(deepG)
data("ecoli") # loads the nucleotide sequence of E. coli
preprocessed <- preprocessSemiRedundant(substr(ecoli, 2, 5000), maxlen = 250) # prepares the batches (one-hot encoding)
Will generate the binary file example_full_model.hdf5
. For more options see the Wiki Training of GenomeNet.
trainNetwork(dataset = preprocessed, batch.size = 500, epochs = 5, maxlen = 250, layers.lstm = 2, layer.size = 25, use.cudnn = F, run.name = "example", tensorboard.log = "log", path.val = "", output = list(none = FALSE, checkpoints =FALSE, tensorboard = FALSE, log = FALSE, serialize_model = FALSE, full_model = TRUE))
We can use now the trained model to generated neuron responses (states) for a suset of the E coli genome. This will generate a binary file named states.h5
writeStates(model.path = "example_full_model.hdf5", sequence = substr(ecoli, 2, 5000), batch.size = 256, layer.depth = 1, filename = "states", vocabulary = c("a","g","c","t"), step = 1, padding = TRUE)
Copyright 2019 Philipp Münch
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.