The function `seqCompare`

computes the likelihood ratio test (LRT) and Bayesian Information Criterion (BIC) for comparing two groups within each of a series of set. The functions `seqBIC`

and `seqLRT`

are aliases that return only the BIC or the LRT.

```
seqCompare(seqdata, seqdata2=NULL, group=NULL, set=NULL,
s=100, seed=36963, stat="all", squared="LRTonly",
weighted=TRUE, opt=NULL, BFopt=NULL, method, ...)
seqLRT(seqdata, seqdata2=NULL, group=NULL, set=NULL, s=100,
seed=36963, squared="LRTonly", weighted=TRUE, opt=NULL,
BFopt=NULL, method, ...)
seqBIC(seqdata, seqdata2=NULL, group=NULL, set=NULL, s=100,
seed=36963, squared="LRTonly", weighted=TRUE, opt=NULL,
BFopt=NULL, method, ...)
```

`seqdata` |
Either a state sequence object ( |

`seqdata2` |
Either a state sequence object ( |

`group` |
Vector of length equal to number of sequences in |

`set` |
Vector of length equal to number of sequences in |

`s` |
Integer. Default 100. The size of random samples of sequences. When 0, no sampling is done. |

`seed` |
Integer. Default 36963. Using the same seed number guarantees the same results
each time. Set |

`stat` |
String. The requested statistics. One of |

`squared` |
Logical. Should squared distances be used? Can also be |

`weighted` |
Logical or String. Should weights be taken into account when available? Can also be |

`opt` |
Integer or |

`BFopt` |
Integer or |

`method` |
String. Method for computing sequence distances. See documentation for |

`...` |
Additional arguments passed to |

The `group`

and `set`

arguments can only be used when `seqdata`

is an `stslist`

object (a state sequence object).

When `seqdata`

and `seqdata2`

are both provided, the LRT and BIC statistics are computed for comparing these two sets. In that case both `group`

and `set`

should be left at their default `NULL`

value.

When `seqdata`

is a list of `stslist`

objects, `seqdata2`

must be a list of the same number of `stslist`

objects.

The default option `squared="LRTonly"`

corresponds to the initial proposition of Liao and Fasang (2021). With that option, the distances to the virtual center are obtained from the pairwise non-squared dissimilarities and the resulting distances to the virtual center are squared when computing the LRT (which is in turn used to compute the BIC). With `squared=FALSE`

, non-squared distances are used in both cases, and with `squared=TRUE`

, squared distances are used in both cases.

The computation is based on the pairwise distances between the sequences. The `opt`

argument permits to choose between two strategies. With `opt=1`

, the matrix of distances is computed successively for each pair of samples of size s. When `opt=2`

, the matrix of distances is computed once for the observed sequences and the distances for the samples are extracted from that matrix. Option 2 is often more efficient, especially for distances based on spells. It may be slower for methods such as OM or LCS when the number of observed sequences becomes large.

The function `seqLRT`

(and seqCompare with the default `"LRT"`

stat value) outputs two variables, `LRT` and `p.LRT`.

`LRT` |
This is the likelihood ratio test statistic for comparing the two groups. |

`p.LRT` |
This is the upper tail probability associated with the LRT. |

The function `seqBIC`

(and `seqLRT`

with the `"BIC"`

stat value) outputs two variables, `BIC` and `BF`.

`BIC` |
This is the difference between two BICs for comparing the two groups. |

`BF` |
This is the Bayes factor associated with the BIC difference. |

`seqCompare`

with `stat="all"`

outputs all four indicators.

Tim Liao and Gilbert Ritschard

Tim F. Liao & Anette E. Fasang (2021). "Comparing Groups of Life Course Sequences Using the Bayesian Information Criterion and the Likelihood Ratio Test.” *Sociological Methodology*, 55 (1), 44-85. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1177/0081175020959401")}.

```
## biofam data set
data(biofam)
biofam.lab <- c("Parent", "Left", "Married", "Left+Marr",
"Child", "Left+Child", "Left+Marr+Child", "Divorced")
alph <- seqstatl(biofam[10:25])
## To illustrate, we use only a sample of 150 cases
set.seed(10)
biofam <- biofam[sample(nrow(biofam),150),]
biofam.seq <- seqdef(biofam, 10:25, alphabet=alph, labels=biofam.lab)
## Defining the grouping variable
lang <- as.vector(biofam[["plingu02"]])
lang[is.na(lang)] <- "unknown"
lang <- factor(lang)
## Chronogram by language group
seqdplot(biofam.seq, group=lang)
## Extracting the sequence subsets by language
lev <- levels(lang)
l <- length(lev)
seq.list <- list()
for (i in 1:l){
seq.list[[i]] <- biofam.seq[lang==lev[i],]
}
seqCompare(list(seq.list[[1]]),list(seq.list[[2]]), stat="all", method="OM", sm="CONSTANT")
seqBIC(biofam.seq, group=biofam$sex, method="HAM")
seqLRT(biofam.seq, group=biofam$sex, set=lang, s=80, method="HAM")
```

