predict_model | R Documentation |
Removes layers (optional) from pretrained model and calculates states of fasta/fastq file or nucleotide sequence.
Writes states to h5 or csv file (access content of h5 output with load_prediction
function).
There are several options on how to process an input file:
If "one_seq"
, computes prediction for sequence argument or fasta/fastq file.
Combines fasta entries in file to one sequence. This means predictor sequences can contain elements from more than one fasta entry.
If "by_entry"
, will output a separate file for each fasta/fastq entry.
Names of output files are: output_dir
+ "Nr" + i + filename
+ output_type
, where i is the number of the fasta entry.
If "by_entry_one_file"
, will store prediction for all fasta entries in one h5 file.
If "one_pred_per_entry"
, will make one prediction for each entry by either picking random sample for long sequences
or pad sequence for short sequences.
predict_model(
model,
output_format = "one_seq",
layer_name = NULL,
sequence = NULL,
path_input = NULL,
round_digits = NULL,
filename = "states.h5",
step = 1,
vocabulary = c("a", "c", "g", "t"),
batch_size = 256,
verbose = TRUE,
return_states = FALSE,
output_type = "h5",
padding = "none",
use_quality = FALSE,
quality_string = NULL,
mode = "label",
lm_format = "target_right",
output_dir = NULL,
format = "fasta",
include_seq = FALSE,
reverse_complement_encoding = FALSE,
ambiguous_nuc = "zero",
...
)
model |
A keras model. |
output_format |
Either |
layer_name |
Name of layer to get output from. If |
sequence |
Character string, ignores path_input if argument given. |
path_input |
Path to fasta file. |
round_digits |
Number of decimal places. |
filename |
Filename to store states in. No file output if argument is |
step |
Frequency of sampling steps. |
vocabulary |
Vector of allowed characters. Characters outside vocabulary get encoded as specified in |
batch_size |
Number of samples used for one network update. |
verbose |
Boolean. |
return_states |
Return predictions as data frame. Only supported for output_format |
output_type |
|
padding |
Either
|
use_quality |
Whether to use quality scores. |
quality_string |
String for encoding with quality scores (as used in fastq format). |
mode |
Either |
lm_format |
Either |
output_dir |
Directory for file output. |
format |
File format, |
include_seq |
Whether to include input sequence in h5 file. |
reverse_complement_encoding |
Whether to use both original sequence and reverse complement as two input sequences. |
ambiguous_nuc |
How to handle nucleotides outside vocabulary, either
|
... |
Further arguments for sequence encoding with |
If return_states = TRUE
returns a list of model predictions and position of corresponding sequences.
If additionally include_seq = TRUE
, list contains sequence strings.
If return_states = FALSE
returns nothing, just writes output to file(s).
# make prediction for single sequence and write to h5 file
model <- create_model_lstm_cnn(maxlen = 20, layer_lstm = 8, layer_dense = 2, verbose = FALSE)
vocabulary <- c("a", "c", "g", "t")
sequence <- paste(sample(vocabulary, 200, replace = TRUE), collapse = "")
output_file <- tempfile(fileext = ".h5")
predict_model(output_format = "one_seq", model = model, step = 10,
sequence = sequence, filename = output_file, mode = "label")
# make prediction for fasta file with multiple entries, write output to separate h5 files
fasta_path <- tempfile(fileext = ".fasta")
create_dummy_data(file_path = fasta_path, num_files = 1,
num_seq = 5, seq_length = 100,
write_to_file_path = TRUE)
model <- create_model_lstm_cnn(maxlen = 20, layer_lstm = 8, layer_dense = 2, verbose = FALSE)
output_dir <- tempfile()
dir.create(output_dir)
predict_model(output_format = "by_entry", model = model, step = 10, verbose = FALSE,
output_dir = output_dir, mode = "label", path_input = fasta_path)
list.files(output_dir)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.