Dissimilarity/distance indices for sequence data

Share:

Description

This function calculates different dissmilarity/distance indices of sequences.

Usage

1
2
3
sequences.distance(sequences = NULL, groups = NULL, 
     method = c("levenshtein", "cosine", "q-gram", "jaccard", "ja-wi", 
                "dam-le", "hamming", "osa", "lcs"), divLength = FALSE)

Arguments

sequences

Vector containing sequences

groups

Vector containing names of different samples (if present)

method

Dissmilariy method (see details)

divLength

Divide sequences into subsets of the same sequence length? (default: FALSE)

Details

This function calculates dissmiliarity/distance indices based on sequences. Levenshtein, cosine, q-gram, Jaccard, Jaro-Winker (ja-wi), Damerau-Levenshtein (dam-le), Hamming, Optimal string alignment (osa) and longest common substring (lcs) distance can be chosen. For details see stringdist-metrics.

Value

Output is a distance matrix containing dissimilarity indices/distances between sequences.

Author(s)

Julia Bischof

References

van der Loo M (2014). The stringdist package for approximate string matching. The R Journal, 6, pp. 111-122. http://CRAN.R-project.org/package=stringdist

See Also

dist.PCoA, plotDistPCoA, geneUsage.distance

Examples

1
2
3
4
5
6
7
8
9
## Not run: 
data(clones.ind)
data(clones.allind)
dist1<-sequences.distance(sequences = clones.ind$unique_CDR3_sequences_AA, 
     method = "levenshtein", divLength=TRUE)
dist2<-sequences.distance(sequences = clones.allind$unique_CDR3_sequences_AA, 
     groups = clones.allind$individuals, method = "cosine", divLength=FALSE)

## End(Not run)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.