tseq2feature_seq2seq: Feature Extraction by time sequence autoencoder

Description Usage Arguments Details Value See Also Examples

View source: R/autoencoder_functions.R

Description

tseq2feature_seq2seq extract features from timestamps of action sequences by a sequence autoencoder.

Usage

1
2
3
4
5
tseq2feature_seq2seq(tseqs, K, cumulative = FALSE, log = TRUE,
  rnn_type = "lstm", n_epoch = 50, method = "last",
  step_size = 1e-04, optimizer_name = "rmsprop", samples_train,
  samples_valid, samples_test = NULL, pca = TRUE, verbose = TRUE,
  return_theta = TRUE)

Arguments

tseqs

a list of n timestamp sequences. Each element is a numeric sequence in the form of a vector of timestamps associated with actions, with the timestamp of the first event (e.g., "start") of 0.

K

the number of features to be extracted.

cumulative

logical. If TRUE, the sequence of cumulative time up to each event is used as input to the neural network. If FALSE, the sequence of inter-arrival time (gap time between an event and the previous event) will be used as input to the neural network. Default is FALSE.

log

logical. If TRUE, for the timestamp sequences, input of the neural net is the base-10 log of the original sequence of times plus 1 (i.e., log10(t+1)). If FALSE, the original sequence of times is used.

rnn_type

the type of recurrent unit to be used for modeling response processes. "lstm" for the long-short term memory unit. "gru" for the gated recurrent unit.

n_epoch

the number of training epochs for the autoencoder.

method

the method for computing features from the output of an recurrent neural network in the encoder. Available options are "last" and "avg".

step_size

the learning rate of optimizer.

optimizer_name

a character string specifying the optimizer to be used for training. Availabel options are "sgd", "rmsprop", "adadelta", and "adam".

samples_train

vectors of indices specifying the training, validation and test sets for training autoencoder.

samples_valid

vectors of indices specifying the training, validation and test sets for training autoencoder.

samples_test

vectors of indices specifying the training, validation and test sets for training autoencoder.

pca

logical. If TRUE, the principal components of features are returned. Default is TRUE.

verbose

logical. If TRUE, training progress is printed.

return_theta

logical. If TRUE, extracted features are returned.

Details

This function trains a sequence-to-sequence autoencoder using keras. The encoder of the autoencoder consists of a recurrent neural network. The decoder consists of another recurrent neural network and a fully connected layer with ReLU activation. The outputs of the encoder are the extracted features.

The output of the encoder is a function of the encoder recurrent neural network. It is the last latent state of the encoder recurrent neural network if method="last" and the average of the encoder recurrent neural network latent states if method="avg".

Value

tseq2feature_seq2seq returns a list containing

theta

a matrix containing K features or principal features. Each column is a feature.

train_loss

a vector of length n_epoch recording the trace of training losses.

valid_loss

a vector of length n_epoch recording the trace of validation losses.

test_loss

a vector of length n_epoch recording the trace of test losses. Exists only if samples_test is not NULL.

See Also

chooseK_seq2seq for choosing K through cross-validation.

Other feature extraction methods: aseq2feature_seq2seq, atseq2feature_seq2seq, seq2feature_mds_large, seq2feature_mds, seq2feature_ngram, seq2feature_seq2seq

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
if (!system("python -c 'import tensorflow as tf'", ignore.stdout = TRUE, ignore.stderr= TRUE)) {
  n <- 50
  data(cc_data)
  samples <- sample(1:length(cc_data$seqs$time_seqs), n)
  tseqs <- cc_data$seqs$time_seqs[samples]
  time_seq2seq_res <- tseq2feature_seq2seq(tseqs, 5, rnn_type="lstm", n_epoch=5, 
                                   samples_train=1:40, samples_valid=41:50)
  features <- time_seq2seq_res$theta
  plot(time_seq2seq_res$train_loss, col="blue", type="l",
     ylim = range(c(time_seq2seq_res$train_loss, time_seq2seq_res$valid_loss)))
  lines(time_seq2seq_res$valid_loss, col="red", type = 'l')
}

ProcData documentation built on April 1, 2021, 5:07 p.m.