library(pander) library(Foodmapping)
The v_fz_tk_sort_r compute similarities between two strings (e.g.two food item names), using the Partial_token_sort_ratio method. This method returns the ratio of the most similar substring as a number between 0 and 100 but sorting the token before comparing. Applied to the comparison of food names, a ratio close to 100 indicates that one food name is fully found in the second food name.
v_fz_tk_sort_r( "Tomatoes" , "raw. tomatoes" )
High ratio reflect high similarity between food names:
v_fz_tk_sort_r( "Tomatoe and basil soup" , "Tomatoe soup with basil" )
Conversely, smaller ratios indicate no / little similarities between food names
v_fz_tk_sort_r( "Tomatoes" , "Carot soup" )
The v_fz_tk_sort_r function can be applied for element-wise comparisons between two vectors.
data(food_sample) food_names1 <- food_sample$FOODNAME_ENG food_names2 <- food_sample$FOODNAME_ENG_COMP scores <- v_fz_tk_sort_r( food_names1 , food_names2) results <- data.frame( name1 = food_names1, name2 = food_names2, score = scores ) pander(head(results), split.table = Inf )
The v_fz_tk_sort_r is optimized for a large number of comparison and compute the similarity metrics in parallel.
# make a large example of 10'000 comparisons - takes < 5 seconds large_example <- expand.grid( A = food_names1[ 1:100 ] , B = food_names2[ 1:100 ] ) # create all pairwise comparisons dim( large_example ) system.time( scores <- v_fz_tk_sort_r( as.character( large_example$A ), as.character( large_example$B ) ) )
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.