# seq_amatch: Approximate matching for integer sequences. In stringdist: Approximate String Matching, Fuzzy Text Search, and String Distance Functions

## Description

For a `list` of integer vectors `x`, find the closest matches in a `list` of integer or numeric vectors in `table.`

## Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15``` ```seq_amatch( x, table, nomatch = NA_integer_, matchNA = TRUE, method = c("osa", "lv", "dl", "hamming", "lcs", "qgram", "cosine", "jaccard", "jw"), weight = c(d = 1, i = 1, s = 1, t = 1), maxDist = 0.1, q = 1, p = 0, bt = 0, nthread = getOption("sd_num_thread") ) seq_ain(x, table, ...) ```

## Arguments

 `x` (`list` of) `integer` or `numeric` vector(s) to be approximately matched. Will be converted with `as.integer`. `table` (`list` of) `integer` or `numeric` vector(s) serving as lookup table for matching. Will be converted with `as.integer`. `nomatch` The value to be returned when no match is found. This is coerced to integer. `matchNA` Should `NA`'s be matched? Default behaviour mimics the behaviour of base `match`, meaning that `NA` matches `NA`. With `NA`, we mean a missing entry in the `list`, represented as `NA_integer_`. If one of the integer sequences stored in the list has an `NA` entry, this is just treated as another integer (the representation of `NA_integer_`). `method` Matching algorithm to use. See `stringdist-metrics`. `weight` For `method='osa'` or `'dl'`, the penalty for deletion, insertion, substitution and transposition, in that order. When `method='lv'`, the penalty for transposition is ignored. When `method='jw'`, the weights associated with integers in elements of `a`, integers in elements of `b` and the transposition weight, in that order. Weights must be positive and not exceed 1. `weight` is ignored completely when `method='hamming'`, `'qgram'`, `'cosine'`, `'Jaccard'`, or `'lcs'`. `maxDist` Elements in `x` will not be matched with elements of `table` if their distance is larger than `maxDist`. Note that the maximum distance between strings depends on the method: it should always be specified. `q` q-gram size, only when method is `'qgram'`, `'jaccard'`, or `'cosine'`. `p` Winkler's prefix parameter for Jaro-Winkler distance, with 0≤q p≤q0.25. Only when method is `'jw'` `bt` Winkler's boost threshold. Winkler's prefix factor is only applied when the Jaro distance is larger than `bt`. Applies only to `method='jw'` and `p>0`. `nthread` Number of threads used by the underlying C-code. A sensible default is chosen, see `stringdist-parallelization`. `...` parameters to pass to `seq_amatch` (except `nomatch`)

## Value

`seq_amatch` returns the position of the closest match of `x` in `table`. When multiple matches with the same minimal distance metric exist, the first one is returned. `seq_ain` returns a `logical` vector of length `length(x)` indicating wether an element of `x` approximately matches an element in `table`.

## Notes

`seq_ain` is currently defined as

`seq_ain(x,table,...) <- function(x,table,...) amatch(x, table, nomatch=0,...) > 0`

All input vectors are converted with `as.integer`. This causes truncation for numeric vectors (e.g. `pi` will be treated as `3L`).

`seq_dist`, `seq_sim`, `seq_qgrams`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25``` ```x <- list(1:3,c(3:1),c(1L,3L,4L)) table <- list( c(5L,3L,1L,2L) ,1:4 ) seq_amatch(x,table,maxDist=2) # behaviour with missings seq_amatch(list(c(1L,NA_integer_,3L),NA_integer_), list(1:3),maxDist=1) ## Not run: # Match sentences based on word order. Note: words must match exactly or they # are treated as completely different. # # For this example you need to have the 'hashr' package installed. x <- "Mary had a little lamb" x.words <- strsplit(x,"[[:blank:]]+") x.int <- hashr::hash(x.words) table <- c("a little lamb had Mary", "had Mary a little lamb") table.int <- hashr::hash(strsplit(table,"[[:blank:]]+")) seq_amatch(x.int,table.int,maxDist=3) ## End(Not run) ```