similarity-methods: Compute Similarities
In arulesSequences: Mining Frequent Sequences

similarity-methods

R Documentation

Compute Similarities

Description

Provides the generic function similarity and the S4 method to compute similarities among a collection of sequences.

is.subset, is.superset find subsequence or supersequence relationships among a collection of sequences.

Usage

similarity(x, y = NULL, ...)

## S4 method for signature 'sequences'
similarity(x, y = NULL,
           method = c("jaccard", "dice", "cosine", "subset"),
	   strict = FALSE)

## S4 method for signature 'sequences'
is.subset(x, y = NULL, proper = FALSE)
## S4 method for signature 'sequences'
is.superset(x, y = NULL, proper = FALSE)

Arguments

`x`, `y`	an object.
`...`	further (unused) arguments.
`method`	a string specifying the similarity measure to use (see details).
`strict`	a logical value specifying if strict itemset matching should be used.
`proper`	a logical value specifying if only strict relationships (omitting equality) should be indicated.

Details

Let the number of common elements of two sequences refer to those that occur in a longest common subsequence. The following similarity measures are implemented:

jaccard:: The number of common elements divided by the total number of elements (the sum of the lengths of the sequences minus the length of the longest common subsequence).
dice:: Uses two times the number of common elements.
cosine:: Uses the square root of the product of the sequence lengths for the denominator.
subset:: Zero if the first sequence is not a subsequence of the second. Otherwise the number of common elements divided by the number of elements in the first sequence.

If strict = TRUE the elements (itemsets) of the sequences must be equal to be matched. Otherwise matches are quantified by the similarity of the itemsets (as specified by method) thresholded at 0.5, and the common sequence by the sum of the similarities.

Value

For similarity, returns an object of class dsCMatrix if the result is symmetric (or method = "subset") and and object of class dgCMatrix otherwise.

For is.subset, is.superset returns an object of class lgCMatrix.

Note

Computation of the longest common subsequence of two sequences of length n, m takes O(n*m) time.

The supported set of operations for the above matrix classes depends on package Matrix. In case of problems, expand to full storage representation using as(x, "matrix") or as.matrix(x).

For efficiency use as(x, "dist") to convert a symmetric result matrix for clustering.

Author(s)

Christian Buchta

Examples

## use example data
data(zaki)
z <- as(zaki, "timedsequences")
similarity(z)

# require equality
similarity(z, strict = TRUE)

## emphasize common
similarity(z, method = "dice")

## 
is.subset(z)
is.subset(z, proper = TRUE)

arulesSequences documentation built on Sept. 11, 2024, 9:35 p.m.

arulesSequences index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

arulesSequences
Mining Frequent Sequences

similarity-methods: Compute Similarities
In arulesSequences: Mining Frequent Sequences

Compute Similarities

Description

Usage

Arguments

Details

Value

Note

Author(s)

See Also

Examples

Related to similarity-methods in arulesSequences...

R Package Documentation

Browse R Packages

We want your feedback!

arulesSequences Mining Frequent Sequences

similarity-methods: Compute Similarities In arulesSequences: Mining Frequent Sequences

Compute Similarities

Description

Usage

Arguments

Details

Value

Note

Author(s)

See Also

Examples

Related to similarity-methods in arulesSequences...

R Package Documentation

Browse R Packages

We want your feedback!

arulesSequences
Mining Frequent Sequences

similarity-methods: Compute Similarities
In arulesSequences: Mining Frequent Sequences