Description Usage Arguments Details Value Note References See Also Examples
StrDif tests whether the difference between two groups of strings is statistically significant or not. The difference is based on normalized Levenshtein distances (LDs) between strings. A permutation test is used as the statistical method.
1 2 | StrDif(grp1_string, grp2_string, num_perm = 1000,
o.x = 0.01, o.y = 0, p.x = 0.015, p.y = 0)
|
grp1_string |
String group (vector) 1. |
grp2_string |
String group (vector) 2. |
num_perm |
Number of permutations. The default is 1000. |
o.x |
x coordinate of the legend in the histogram, default is 0.01. |
o.y |
y coordinate of the legend in the histogram, default is 0. |
p.x |
x coordinate of the p value in the histogram, default is 0.015. |
p.y |
y coordinate of the legend in the histogram, default is 0. |
The default values of o.y and p.y are 0. They are actually related to num_perm: o.y is above 0.2 * num_perm, and p.y is below 0.2 * num_perm. If non-default values are used, the values become absolute y coordinates.
The function generates a histogram that demonstrates the distribution of the differences of LDs, the original difference, and the p value.
The function also returns a vector containing differences of normalized LDs. The total number of differences is num_perm (number of permutations).
Differences are calculated by subtracting within-group LD from between-group LD. They range from -1 to 1. The "observed" difference is the difference from the original data set.
1. Because the number of permutations is usually large (default is 1000), and so is the number of elements in the vector returned from the function, it's better for the user to use a vector to store the returned results, instead of printing out directly. See the examples.
2. The positions of legend and p value in the histogram generated from function StrDif may not be ideal for different (permutations on differences of normalized Levenshtein distances) situations. Thus, this package includes another function, HistDif, to customize the positions of legend and p value in the histogram.
3. The time to run this function can be relatively long (from seconds to minutes depending on the number and lengths of strings as well as the computer performance).
4. Acknowledgement: The first version of this function was developed with significant help from Dr. Rhonda DeCook in the Department of Statistics and Actuarial Science at the University of Iowa.
1. H. Tang; J. J. Topczewski; A. M. Topczewski; N. J. Pienta. Permutation Test for Groups of Scanpaths Using Normalized Levenshtein Distances and Application in NMR Questions. In Proceedings of the Symposium on Eye Tracking Research and Applications, Santa Barbara, CA, March 28-30, 2012; ACM Press: New York; pp 169-172.
2. M. Feusner; B. Lukoff. (2008). Testing for statistically significant differences between groups of scan patterns. In Proceedings of the Symposium on Eye-tracking Research & Applications, ACM Press, New York, 43-46.
1 2 3 4 5 6 7 8 9 10 11 | # simple stings, non-default permutation number and p-value position
strs1.vec <- c("ABCDdefABCDa", "def123DC", "123aABCD", "ACD13", "AC1ABC", "3123fe")
strs2.vec <- c("xYZdkfAxDa", "ef1563xy", "BC9Dzy35X", "AkeC1fxz", "65CyAdC", "Dfy3f69k")
ld.dif.vec <- StrDif(strs1.vec, strs2.vec, num_perm = 500, p.x = 0.025)
# longer strings
data(str1)
data(str2)
s1 <- str1[1:6]
s2 <- str2[1:6]
ld.dif12.vec <- StrDif(s1, s2, num_perm = 500)
|
For the initial two groups of strings,
the average normalized between-group levenshtein Distance is: 0.85056
the average normalized within-group levenshtein Distance is: 0.84306
the difference in the average normalized levenshtein Distance between between-group and within-group is: 0.00751.
The p value of the permutation test is: 0.36400
For the initial two groups of strings,
the average normalized between-group levenshtein Distance is: 0.76347
the average normalized within-group levenshtein Distance is: 0.72911
the difference in the average normalized levenshtein Distance between between-group and within-group is: 0.03435.
The p value of the permutation test is: 0.02600
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.