seq_dist | R Documentation |
seq_dist
computes pairwise string distances between elements of
a
and b
, where the argument with less elements is recycled.
seq_distmatrix
computes the distance matrix with rows according to
a
and columns according to b
.
seq_dist( a, b, method = c("osa", "lv", "dl", "hamming", "lcs", "qgram", "cosine", "jaccard", "jw"), weight = c(d = 1, i = 1, s = 1, t = 1), q = 1, p = 0, bt = 0, nthread = getOption("sd_num_thread") ) seq_distmatrix( a, b, method = c("osa", "lv", "dl", "hamming", "lcs", "qgram", "cosine", "jaccard", "jw"), weight = c(d = 1, i = 1, s = 1, t = 1), q = 1, p = 0, bt = 0, useNames = c("names", "none"), nthread = getOption("sd_num_thread") )
a |
( |
b |
( |
method |
Distance metric. See |
weight |
For |
q |
Size of the q-gram; must be nonnegative. Only applies to
|
p |
Prefix factor for Jaro-Winkler distance. The valid range for
|
bt |
Winkler's boost threshold. Winkler's prefix factor is
only applied when the Jaro distance is larger than |
nthread |
Maximum number of threads to use. By default, a sensible
number of threads is chosen, see |
useNames |
label the output matrix with |
seq_dist
returns a numeric vector with pairwise distances between a
and b
of length max(length(a),length(b)
.
For seq_distmatrix
there are two options. If b
is missing, the
dist
object corresponding to the length(a) X
length(a)
distance matrix is returned. If b
is specified, the
length(a) X length(b)
distance matrix is returned.
If any element of a
or b
is NA_integer_
, the distance with
any matched integer vector will result in NA
. Missing values in the sequences
themselves are treated as a number and not treated specially (Also see the examples).
Input vectors are converted with as.integer
. This causes truncation for numeric
vectors (e.g. pi
will be treated as 3L
).
seq_sim
, seq_amatch
, seq_qgrams
# Distances between lists of integer vectors. Note the postfix 'L' to force # integer storage. The shorter argument is recycled over (\code{a}) a <- list(c(102L, 107L)) # fu b <- list(c(102L,111L,111L),c(102L,111L,111L)) # foo, fo seq_dist(a,b) # translate strings to a list of integer sequences a <- lapply(c("foo","bar","baz"),utf8ToInt) seq_distmatrix(a) # Note how missing values are treated. NA's as part of the sequence are treated # as an integer (the representation of NA_integer_). a <- list(NA_integer_,c(102L, 107L)) b <- list(c(102L,111L,111L),c(102L,111L,NA_integer_)) seq_dist(a,b) ## Not run: # Distance between sentences based on word order. Note: words must match exactly or they # are treated as completely different. # # For this example you need to have the 'hashr' package installed. a <- "Mary had a little lamb" a.words <- strsplit(a,"[[:blank:]]+") a.int <- hashr::hash(a.words) b <- c("a little lamb had Mary", "had Mary a little lamb") b.int <- hashr::hash(strsplit(b,"[[:blank:]]+")) seq_dist(a.int,b.int) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.