catch_em: Match cheaters

Description Usage Arguments Value Author(s)

View source: R/catch_em.R

Description

Match cheaters

Usage

1
catch_em(flist, n_grams = 10, time_lim = 1L)

Arguments

flist

a list of documents (.doc/.docx/.pdf). A full/relative path must be provided.

n_grams

see ngram package.

time_lim

max time in seconds for each comparison. Defult is 1 second, had no problem comparing documents with 50K words.

diag

What value should the diagonal (the score between an item an itself) take.

Value

results

A correlation-like matrix with each cell indicating the match (0-1) between two of the documents.

bad_files
  • bad_read vector of documents that could not be read.

  • bad_ngrams matrix of pair-wise comparisons that could not be compared.

Author(s)

Mattan S. Ben-Shachar


mattansb/cheatR documentation built on Dec. 24, 2019, 10:07 p.m.