predict_probabilities | R Documentation |
Predict probabilities of candidates based on their violation profiles and constraint weights.
predict_probabilities(
test_input,
constraint_weights,
output_path = NA,
out_sep = ",",
encoding = "unknown",
temperature = DEFAULT_TEMPERATURE
)
test_input |
The input data frame/data table/tibble. This should contain one or more OT tableaux consisting of mappings between underlying and surface forms with observed frequency and violation profiles. Constraint violations must be numeric. For an example of the data frame format, see inst/extdata/sample_data_frame.csv. You can read this file into a data frame using read.csv or into a tibble using dplyr::read_csv. This function also supports the legacy OTSoft file format. You can use this format by passing in a file path string to the OTSoft file rather than a data frame. For examples of OTSoft format, see inst/extdata/sample_data_file.txt. |
constraint_weights |
A vector of constraint weights to use. These are typically
generated by the |
output_path |
(optional) A string specifying the path to a file to which the predictions will be saved. If the file exists it will be overwritten. If this argument isn't provided, the output will not be written to a file. |
out_sep |
(optional) The delimiter used in the output files. Defaults to commas. |
encoding |
(optional) The character encoding of the input file. Defaults to "unknown". |
temperature |
(optional) The temperature parameter, which should be a
real number |
For each input/output pair in the provided file this function will calculate the probability of that output given the input form and the provided weights. This probability is defined as
P(y|x; w) = \frac{1}{Z_w(x)}\exp(-\sum_{k=1}^{m}{w_k f_k(y, x)})
where f_k(y, x)
is the number of violations of constraint k
incurred by mapping underlying x
to surface y
, w_k
is the
weight associated with constraint k
, and Z_w(x)
is a
normalization term defined as
Z_w(x) = \sum_{y\in\mathcal{Y}(x)}{\exp(-\sum_{k=1}^{m}{w_k f_k(y, x)})}
where \mathcal{Y}(x)
is the set of all output candidates for input
x
.
The resulting probabilities will be appended to a data frame object
representing the input tableaux. This data frame can also be saved to a file
if the output_path
argument is provided.
An object with the following named attributes:
log_lik
: the log likelihood of the data under the provided
weights
predictions
: A data table containing all the tableaux, with
probabilities assigned to each candidate and errors.
If the temperature parameter T
is specified, P(y|x; w)
is
calculated as
\frac{1}{Z_w(x)}\exp(-\sum_{k=1}^{m}{(w_k f_k(y, x)})/T)
and
Z_w(x)
is similarly calculated as
\sum_{y\in \mathcal{Y}(x)}{\exp(-\sum_{k=1}^{m}{(w_k f_k(y, x))/T})}
Larger values of T
move the predicted probabilities of output
candidates for a particular input towards equality with one another. For
example, if a particular input has two candidate outputs, higher values of
T
will move the probability of each towards 0.5
.
The temperature parameter can be used to generate less categorical
predictions in a way that is independent of the constraint weights. See
Ackley, Hinton, and Sejnowski (1985, p. 150-152) for more detail, and Hayes
et al. (2009) and Mayer (2021, Ch. 4) for examples of temperature used in
practice. By default this parameter is set to 1
, which renders the
equations in this section equivalent to the standard calculations of
probability.
# Get paths to toy data file
df_file <- system.file(
"extdata", "sample_data_frame.csv", package = "maxent.ot"
)
# Fit weights to dataframe with no biases
tableaux_df <- read.csv(df_file)
fit_model <- optimize_weights(tableaux_df)
predict_probabilities(tableaux_df, fit_model$weights)
# Do so with a temperature parameter
predict_probabilities(tableaux_df, fit_model$weights, temperature = 2)
# Save predictions to a file
tmp_output <- tempfile()
predict_probabilities(tableaux_df, fit_model$weights, output_path=tmp_output)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.