tseq2feature_seq2seq: Feature Extraction by time sequence autoencoder
In ProcData: Process Data Analysis

Description Usage Arguments Details Value See Also Examples

tseq2feature_seq2seq extract features from timestamps of action sequences by a sequence autoencoder.

tseq2feature_seq2seq(tseqs, K, cumulative = FALSE, log = TRUE,
  rnn_type = "lstm", n_epoch = 50, method = "last",
  step_size = 1e-04, optimizer_name = "rmsprop", samples_train,
  samples_valid, samples_test = NULL, pca = TRUE, verbose = TRUE,
  return_theta = TRUE)

`tseqs`	a list of `n` timestamp sequences. Each element is a numeric sequence in the form of a vector of timestamps associated with actions, with the timestamp of the first event (e.g., "start") of 0.
`K`	the number of features to be extracted.
`cumulative`	logical. If TRUE, the sequence of cumulative time up to each event is used as input to the neural network. If FALSE, the sequence of inter-arrival time (gap time between an event and the previous event) will be used as input to the neural network. Default is FALSE.
`log`	logical. If TRUE, for the timestamp sequences, input of the neural net is the base-10 log of the original sequence of times plus 1 (i.e., log10(t+1)). If FALSE, the original sequence of times is used.
`rnn_type`	the type of recurrent unit to be used for modeling response processes. `"lstm"` for the long-short term memory unit. `"gru"` for the gated recurrent unit.
`n_epoch`	the number of training epochs for the autoencoder.
`method`	the method for computing features from the output of an recurrent neural network in the encoder. Available options are `"last"` and `"avg"`.
`step_size`	the learning rate of optimizer.
`optimizer_name`	a character string specifying the optimizer to be used for training. Availabel options are `"sgd"`, `"rmsprop"`, `"adadelta"`, and `"adam"`.
`samples_train`	vectors of indices specifying the training, validation and test sets for training autoencoder.
`samples_valid`	vectors of indices specifying the training, validation and test sets for training autoencoder.
`samples_test`	vectors of indices specifying the training, validation and test sets for training autoencoder.
`pca`	logical. If TRUE, the principal components of features are returned. Default is TRUE.
`verbose`	logical. If TRUE, training progress is printed.
`return_theta`	logical. If TRUE, extracted features are returned.

This function trains a sequence-to-sequence autoencoder using keras. The encoder of the autoencoder consists of a recurrent neural network. The decoder consists of another recurrent neural network and a fully connected layer with ReLU activation. The outputs of the encoder are the extracted features.

The output of the encoder is a function of the encoder recurrent neural network. It is the last latent state of the encoder recurrent neural network if method="last" and the average of the encoder recurrent neural network latent states if method="avg".

tseq2feature_seq2seq returns a list containing

`theta`	a matrix containing `K` features or principal features. Each column is a feature.
`train_loss`	a vector of length `n_epoch` recording the trace of training losses.
`valid_loss`	a vector of length `n_epoch` recording the trace of validation losses.
`test_loss`	a vector of length `n_epoch` recording the trace of test losses. Exists only if `samples_test` is not `NULL`.

chooseK_seq2seq for choosing K through cross-validation.

Other feature extraction methods: aseq2feature_seq2seq, atseq2feature_seq2seq, seq2feature_mds_large, seq2feature_mds, seq2feature_ngram, seq2feature_seq2seq

if (!system("python -c 'import tensorflow as tf'", ignore.stdout = TRUE, ignore.stderr= TRUE)) {
  n <- 50
  data(cc_data)
  samples <- sample(1:length(cc_data$seqs$time_seqs), n)
  tseqs <- cc_data$seqs$time_seqs[samples]
  time_seq2seq_res <- tseq2feature_seq2seq(tseqs, 5, rnn_type="lstm", n_epoch=5, 
                                   samples_train=1:40, samples_valid=41:50)
  features <- time_seq2seq_res$theta
  plot(time_seq2seq_res$train_loss, col="blue", type="l",
     ylim = range(c(time_seq2seq_res$train_loss, time_seq2seq_res$valid_loss)))
  lines(time_seq2seq_res$valid_loss, col="red", type = 'l')
}