statesByFastaOneFile: Write states to h5 file
In hiddengenome/altum: deepG

writeStatesByFastaEntries Removes layers (optional) from pretrained model and calculates states of fasta file, writes separate states matrix in one .h5 file for every fasta entry. h5 file also contains the nucleotide sequences and positions of targets corresponding to states.

To acces the content of h5 file: h5_path <- "/path/to/file" h5_file <- hdf5r::H5File$new(h5_path, mode = "r") a <- h5_file[["states"]] # shows header names # names(a) b <- a[["someHeaderName"]] #shows state matrix #b[,] h5_file$close_all()

statesByFastaOneFile(
  model.path,
  layer.depth = NULL,
  fasta.path,
  round_digits = 2,
  h5.filename = "states.h5",
  step = 1,
  vocabulary = c("a", "c", "g", "t"),
  batch.size = 256,
  padding = FALSE,
  verbose = TRUE,
  model = NULL,
  mode = "lm"
)

`model.path`	Path to a pretrained model.
`layer.depth`	Depth of layer to evaluate. If NULL last layer is used.
`fasta.path`	Path to fasta file.
`round_digits`	Number of decimal places.
`h5.filename`	Filename of h5 file to store states.
`step`	Frequency of sampling steps.
`vocabulary`	Vector of allowed characters, character outside vocabulary get encoded as 0-vector.
`batch.size`	Number of samples to evaluate at once. Does not change output, only relevant for speed and memory.
`padding`	Logical scalar, generate states for first maxlen nucleotides by padding beginning of sequence with 0-vectors.
`verbose`	Whether to print model before and after removing layers.
`model`	A keras model. If model and model.path are not NULL, model will be used for inference.
`mode`	Either "lm" for language model or "label" for label classification.