Description Usage Arguments Value

Multiple document comparison for textual overlap

1 | ```
multi_doc_compare(texts, n_grams, sd_criterion)
``` |

`texts` |
character vector of texts, each text is a string in the vector |

`n_grams` |
integer to specify ngram units |

`sd_criterion` |
numeric set a standard deviation criterion for returning documents that are unsually similar, 2-3 is pretty good |

list

dtm matrix document term matrix for all texts

histogram a histogram of the cosine similarity values between every text

similarities matrix cosine similarities between every text

mean_similarity numeric the mean similarity between all texts

sd_similarity numeric the standard deviation of the similarities

check_these dataframe document pairs that were above the criterion, might want to check these ones))

