README.md
In maraab23/seqquality: Sequence Quality Index

Sequence Quality Index

Marcel Raab

seqquality is a R function which computes a generalized version of the sequence quality index proposed by Manzoni and Mooi-Reci (2018). The index is defined as

$SQ$

where $i$ indicates the position within the sequence and $k$ the total length of the sequence. $w$ is a weighting factor simultaneously affecting how strong the index reacts to (and recovers from) a change in state quality. $q_{i}$ is a weighting factor denoting the quality of a state at position $i$ . The function normalizes $q_{i}$ to have values between 0 and 1. Therefore, $q_{max}=1$ . If no quality vector is specified, the first state of the alphabet is coded 0, whereas the last state is coded 1. For the states in-between each step up the hierarchy increases the value of the vector by ${1}/{(l(A)-1)}$ , with $l(A)$ indicating the length of the alphabet. This procedure was borrowed from seqprecstart, a helper function used for the implementation of the sequence precarity index proposed by Ritschard et al. (2018).

The package can be installed using install_github from the devtools package:

install.packages("devtools")
library(devtools)
install_github("maraab23/seqquality")
library(seqquality)

First, you need to load additional libraries and data to run the examples.

library(tidyverse)
library(TraMineR)

data(actcal)
# Define state sequence object
actcal.seq <- seqdef(actcal[,13:24])

We use the actcal example data that come with the TraMineR package. This dataset comprises 2000 individual sequences of monthly activity statuses from January to December 2000 (type ?actcal for getting more details). The sequence alphabet is defined as:

A = Full-time paid job (> 37 hours)
B = Long part-time paid job (19-36 hours)
C = Short part-time paid job (1-18 hours)
D = Unemployed (no work)

For illustration purposes we impose the following state quality hierarchy: $D\<C\<B\<A$

The default version of the quality index with $w=1$ can be obtained by typing:

seqquality(actcal.seq, stqual = 4:1)

## # A tibble: 2,000 x 1
##    `w=1`
##    <dbl>
##  1 0.667
##  2 0.718
##  3 0.667
##  4 0.474
##  5 1    
##  6 0.658
##  7 0    
##  8 1    
##  9 1    
## 10 1    
## # ... with 1,990 more rows

When time.varying=TRUE the index is computed for every position $i$ by incrementing the length of the sequences by 1 until the full sequence length is reached. The following command computes the time-varying quality index using three different weighting factors $w=(.5,1,2)$ .

seqquality(actcal.seq, 
           stqual = c(4:1), 
           weight = c(.5,1,2), 
           time.varying = TRUE)

## $`w=0.5`
## # A tibble: 2,000 x 13
##    weight `t=1` `t=2` `t=3` `t=4` `t=5` `t=6` `t=7` `t=8` `t=9` `t=10` `t=11`
##     <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>
##  1    0.5 0.816 0.816 0.816 0.816 0.816 0.816 0.816 0.816 0.816  0.816  0.816
##  2    0.5 0     0     0     0     0.267 0.433 0.544 0.623 0.682  0.726  0.762
##  3    0.5 0.816 0.816 0.816 0.816 0.816 0.816 0.816 0.816 0.816  0.816  0.816
##  4    0.5 0.577 0.577 0.577 0.577 0.577 0.577 0.577 0.577 0.577  0.611  0.637
##  5    0.5 1     1     1     1     1     1     1     1     1      1      1    
##  6    0.5 0     0.478 0.62  0.684 0.719 0.741 0.756 0.766 0.774  0.78   0.785
##  7    0.5 0     0     0     0     0     0     0     0     0      0      0    
##  8    0.5 1     1     1     1     1     1     1     1     1      1      1    
##  9    0.5 1     1     1     1     1     1     1     1     1      1      1    
## 10    0.5 1     1     1     1     1     1     1     1     1      1      1    
## # ... with 1,990 more rows, and 1 more variable: `t=12` <dbl>
## 
## $`w=1`
## # A tibble: 2,000 x 13
##    weight `t=1` `t=2` `t=3` `t=4` `t=5` `t=6` `t=7` `t=8` `t=9` `t=10` `t=11`
##     <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>
##  1      1 0.667 0.667 0.667 0.667 0.667 0.667 0.667 0.667 0.667  0.667  0.667
##  2      1 0     0     0     0     0.333 0.524 0.643 0.722 0.778  0.818  0.848
##  3      1 0.667 0.667 0.667 0.667 0.667 0.667 0.667 0.667 0.667  0.667  0.667
##  4      1 0.333 0.333 0.333 0.333 0.333 0.333 0.333 0.333 0.333  0.394  0.439
##  5      1 1     1     1     1     1     1     1     1     1      1      1    
##  6      1 0     0.444 0.556 0.6   0.622 0.635 0.643 0.648 0.652  0.655  0.657
##  7      1 0     0     0     0     0     0     0     0     0      0      0    
##  8      1 1     1     1     1     1     1     1     1     1      1      1    
##  9      1 1     1     1     1     1     1     1     1     1      1      1    
## 10      1 1     1     1     1     1     1     1     1     1      1      1    
## # ... with 1,990 more rows, and 1 more variable: `t=12` <dbl>
## 
## $`w=2`
## # A tibble: 2,000 x 13
##    weight `t=1` `t=2` `t=3` `t=4` `t=5` `t=6` `t=7` `t=8` `t=9` `t=10` `t=11`
##     <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>
##  1      2 0.444 0.444 0.444 0.444 0.444 0.444 0.444 0.444 0.444  0.444  0.444
##  2      2 0     0     0     0     0.455 0.67  0.786 0.853 0.895  0.922  0.941
##  3      2 0.444 0.444 0.444 0.444 0.444 0.444 0.444 0.444 0.444  0.444  0.444
##  4      2 0.111 0.111 0.111 0.111 0.111 0.111 0.111 0.111 0.111  0.198  0.257
##  5      2 1     1     1     1     1     1     1     1     1      1      1    
##  6      2 0     0.356 0.413 0.43  0.436 0.44  0.441 0.442 0.443  0.443  0.444
##  7      2 0     0     0     0     0     0     0     0     0      0      0    
##  8      2 1     1     1     1     1     1     1     1     1      1      1    
##  9      2 1     1     1     1     1     1     1     1     1      1      1    
## 10      2 1     1     1     1     1     1     1     1     1      1      1    
## # ... with 1,990 more rows, and 1 more variable: `t=12` <dbl>

Finally, we present an example illustrating how to implement the original binary sequence quality index. For this purpose we create example data containing four sequences and four states. Only the last two states of the alphabet (A & B) are considered as success (stqual = c(0,0,1,1)).

# Generate example data
data <- matrix(c(c(rep("D", 3), rep("C", 1), rep("B", 6), rep("A", 10)),
                 c(rep("A", 6), rep("D", 4), rep("C", 4), rep("B", 6)),
                 c(rep("D", 2), rep("C", 5), rep("D", 3), rep("B", 5), rep("A", 5)),
                 c(rep("D", 2), rep("C", 5), rep("B", 5), rep("A", 5), rep("D", 3))), nrow = 4, byrow = TRUE)

# Generate state sequence object
example.seq <- seqdef(data, alphabet = c("A","B","C","D"))

# Save print-friendly version of sequences for graph legend
example.sps <- print(example.seq, format = "SPS")

##     Sequence                     
## [1] (D,3)-(C,1)-(B,6)-(A,10)     
## [2] (A,6)-(D,4)-(C,4)-(B,6)      
## [3] (D,2)-(C,5)-(D,3)-(B,5)-(A,5)
## [4] (D,2)-(C,5)-(B,5)-(A,5)-(D,3)

# Compute time-varying quality index using a binary definition of  quality
qual.binary.tvar <- seqquality(example.seq,
                               stqual = c(1,1,0,0),
                               time.varying = TRUE)

Following the example of Manzoni and Mooi-Reci (2018), we proceed by visualizing how the sequence quality index develops across the positions of the sequences:

# Preparing the data for ggplot (-> long format)
fig.data <- qual.binary.tvar %>%
  mutate(Sequence = example.sps) %>%
  select(-weight) %>%
  pivot_longer(-Sequence, 
               names_to = "Position",
               values_to = "Sequence Quality") %>%
  mutate(Position = as.numeric(substring(Position, first = 3)))

# Plot the development of the sequence quality index
fig.data %>%
  ggplot(aes(x = Position, 
             y = `Sequence Quality`, 
             color = Sequence)) +
  geom_line(size=1) +
  theme_minimal() +
  theme(legend.position="bottom") +
  guides(col=guide_legend(nrow=2,byrow=TRUE))

Manzoni, A., & Mooi-Reci, I. (2018). Measuring Sequence Quality. In G. Ritschard & M. Studer (Eds.), Sequence Analysis and Related Approaches (pp. 261–278). doi: 10.1007/978-3-319-95420-2_15

Ritschard, G., Bussi, M., & O’Reilly, J. (2018). An Index of Precarity for Measuring Early Employment Insecurity. In G. Ritschard & M. Studer (Eds.), Sequence Analysis and Related Approaches (pp. 279–295). doi: 10.1007/978-3-319-95420-2_16

maraab23/seqquality documentation built on Feb. 8, 2022, 12:54 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com