Description Usage Arguments Details Value Note Author(s) See Also Examples

Provides the generic function `similarity`

and the S4 method
to compute similarities among a collection of sequences.

`is.subset, is.superset`

find subsequence or supersequence
relationships among a collection of sequences.

1 2 3 4 5 6 7 8 9 10 11 | ```
similarity(x, y = NULL, ...)
## S4 method for signature 'sequences'
similarity(x, y = NULL,
method = c("jaccard", "dice", "cosine", "subset"),
strict = FALSE)
## S4 method for signature 'sequences'
is.subset(x, y = NULL, proper = FALSE)
## S4 method for signature 'sequences'
is.superset(x, y = NULL, proper = FALSE)
``` |

`x, y` |
an object. |

`...` |
further (unused) arguments. |

`method` |
a string specifying the similarity measure to use (see details). |

`strict` |
a logical value specifying if strict itemset matching should be used. |

`proper` |
a logical value specifying if only strict relationships (omitting equality) should be indicated. |

Let the number of *common* elements of two sequences refer to
those that occur in a longest common subsequence. The following
similarity measures are implemented:

`jaccard`

:The number of common elements divided by the total number of elements (the sum of the lengths of the sequences minus the length of the longest common subsequence).

`dice`

:Uses two times the number of common elements.

`cosine`

:Uses the square root of the product of the sequence lengths for the denominator.

`subset`

:Zero if the first sequence is not a subsequence of the second. Otherwise the number of common elements divided by the number of elements in the first sequence.

If `strict = TRUE`

the elements (itemsets) of the sequences must
be equal to be matched. Otherwise matches are quantified by the
similarity of the itemsets (as specified by `method`

) thresholded
at 0.5, and the common sequence by the sum of the similarities.

For `similarity`

, returns an object of class
`dsCMatrix`

if the result
is symmetric (or `method = "subset"`

) and and object of
class `dgCMatrix`

otherwise.

For `is.subset, is.superset`

returns an object of class
`lgCMatrix`

.

Computation of the longest common subsequence of two sequences of
length `n, m`

takes `O(n*m)`

time.

The supported set of operations for the above matrix classes depends
on package Matrix. In case of problems, expand to full storage
representation using `as(x, "matrix")`

or `as.matrix(x)`

.

For efficiency use `as(x, "dist")`

to convert a symmetric
result matrix for clustering.

Christian Buchta

Class
`sequences`

,
method
`dissimilarity`

.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 | ```
## use example data
data(zaki)
z <- as(zaki, "timedsequences")
similarity(z)
# require equality
similarity(z, strict = TRUE)
## emphasize common
similarity(z, method = "dice")
##
is.subset(z)
is.subset(z, proper = TRUE)
``` |

```
Loading required package: arules
Loading required package: Matrix
Attaching package: 'arules'
The following objects are masked from 'package:base':
abbreviate, write
4 x 4 sparse Matrix of class "dsCMatrix"
1 2 3 4
1 1.00 0.2 0.25 .
2 0.20 1.0 0.50 .
3 0.25 0.5 1.00 .
4 . . . 1
4 x 4 sparse Matrix of class "dsCMatrix"
1 2 3 4
1 1.00 0.2 0.25 .
2 0.20 1.0 0.50 .
3 0.25 0.5 1.00 .
4 . . . 1
4 x 4 sparse Matrix of class "dsCMatrix"
1 2 3 4
1 1.0000000 0.3333333 0.4000000 .
2 0.3333333 1.0000000 0.6666667 .
3 0.4000000 0.6666667 1.0000000 .
4 . . . 1
4 x 4 sparse Matrix of class "lgCMatrix"
1 2 3 4
1 | . . .
2 . | . .
3 | | | .
4 . . . |
4 x 4 sparse Matrix of class "lgCMatrix"
1 2 3 4
1 . . . .
2 . . . .
3 | | . .
4 . . . .
```

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.