View source: R/imposters.optimize.R
imposters.optimize | R Documentation |
A function to optimize hyperparameters used in the General Imposters method
(see link{imposters}
for further details). Using a grid search approach,
it tries to define a grey area where the attribution scores are not reliable.
imposters.optimize(reference.set,
classes.reference.set = NULL,
parameter.incr = 0.01,
...)
reference.set |
a table containing frequencies/counts for several
variables – e.g. most frequent words – across a number of texts
written by different authors. Usually, it is a corpus of known authors
(at least two tests per author) that is used to tune the optimal
hyperparameters for the imposters method. Such a tuning involves
a leave-one-out procedure of identifying a gray area when the
results returned by the classifier are not particularly reliable.
E.g., if one gets 0.39 and 0.55 as the parameters, one would assume
that any results of the |
classes.reference.set |
a vector containing class identifiers for the reference set. When missing, the row names of the set table will be used; the assumed classes are the strings of characters followed by the first underscore. Consider the following example: c("Sterne_Tristram", "Sterne_Sentimental", "Fielding_Tom", ...), where the classes are the authors' names. Note that only the part up to the first underscore in the sample's name will be included in the class label. |
parameter.incr |
the procedure tries to optimize the hyperparameters via a grid search – this means that it tests the range of values between 0 and 1 incremented by a certain fraction. If this is set to 0.01 (default), it test 0, 0.01, 0.02, 0.03, ... |
... |
any other argument that can be passed to the classifier; see
|
The function returns two scores: the P1 and P2 values.
Maciej Eder
Koppel, M. , and Winter, Y. (2014). Determining if two documents are written by the same author. "Journal of the Association for Information Science and Technology", 65(1): 178-187.
Kestemont, M., Stover, J., Koppel, M., Karsdorp, F. and Daelemans, W. (2016). Authenticating the writings of Julius Caesar. "Expert Systems With Applications", 63: 86-96.
imposters
## Not run:
# activating a dummy dataset, in our case: Harper Lee and her Southern colleagues
data(lee)
# running the imposters method against all the remaining authorial classes
imposters.optimize(lee)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.