# seq2feature_mds: Feature extraction via multidimensional scaling In ProcData: Process Data Analysis

## Description

seq2feature_mds extracts K features from response processes by multidimensional scaling.

## Usage

 1 2 3 4 seq2feature_mds(seqs = NULL, K = 2, method = "auto", dist_type = "oss_action", pca = TRUE, subset_size = 100, subset_method = "random", n_cand = 10, return_dist = FALSE, L_set = 1:3) 

## Arguments

 seqs a "proc" object or a square matrix. If a squared matrix is provided, it is treated as the dissimilary matrix of a group of response processes. K the number of features to be extracted. method a character string specifies the algorithm used for performing MDS. See 'Details'. dist_type a character string specifies the dissimilarity measure for two response processes. See 'Details'. pca logical. If TRUE (default), the principal components of the extracted features are returned. subset_size, n_cand two parameters used in the large data algorithm. See 'Details' and seq2feature_mds_large. subset_method a character string specifying the method for choosing the subset in the large data algorithm. See 'Details' and seq2feature_mds_large. return_dist logical. If TRUE, the dissimilarity matrix will be returned. Default is FALSE. L_set length of ngrams considered

## Details

Since the classical MDS has a computational complexity of order n^3 where n is the number of response processes, it is computational expensive to perform classical MDS when a large number of response processes is considered. In addition, storing an n \times n dissimilarity matrix when n is large require a large amount of memory. In seq2feature_mds, the algorithm proposed in Paradis (2018) is implemented to obtain MDS for large datasets. method specifies the algorithm to be used for obtaining MDS features. If method = "small", classical MDS is used by calling cmdscale. If method = "large", the algorithm for large datasets will be used. If method = "auto" (default), seq2feature_mds selects the algorithm automatically based on the sample size.

dist_type specifies the dissimilarity to be used for measuring the discrepancy between two response processes. If dist_type = "oss_action", the order-based sequence similarity (oss) proposed in Gomez-Alonso and Valls (2008) is used for action sequences. If dist_type = "oss_both", both action sequences and timestamp sequences are used to compute a time-weighted oss.

The number of features to be extracted K can be selected by cross-validation using chooseK_mds.

## Value

seq2feature_mds returns a list containing

 theta a numeric matrix giving the K extracted features or principal features. Each column is a feature. dist_mat the dissimilary matrix. This element exists only if return_dist=TRUE.

## References

Gomez-Alonso, C. and Valls, A. (2008). A similarity measure for sequences of categorical data based on the ordering of common elements. In V. Torra & Y. Narukawa (Eds.) Modeling Decisions for Artificial Intelligence, (pp. 134-145). Springer Berlin Heidelberg.

Paradis, E. (2018). Multidimensional scaling with very large datasets. Journal of Computational and Graphical Statistics, 27(4), 935-939.

Tang, X., Wang, Z., He, Q., Liu, J., and Ying, Z. (2020) Latent Feature Extraction for Process Data via Multidimensional Scaling. Psychometrika, 85, 378-397.

## See Also

chooseK_mds for choosing K.

Other feature extraction methods: aseq2feature_seq2seq, atseq2feature_seq2seq, seq2feature_mds_large, seq2feature_ngram, seq2feature_seq2seq, tseq2feature_seq2seq

## Examples

 1 2 3 4 n <- 50 set.seed(12345) seqs <- seq_gen(n) theta <- seq2feature_mds(seqs, 5)\$theta 

ProcData documentation built on April 1, 2021, 5:07 p.m.