predict_probabilities: Predict probabilities of OT candidates

View source: R/predict.R

predict_probabilitiesR Documentation

Predict probabilities of OT candidates

Description

Predict probabilities of candidates based on their violation profiles and constraint weights.

Usage

predict_probabilities(
  test_input,
  constraint_weights,
  output_path = NA,
  out_sep = ",",
  encoding = "unknown",
  temperature = DEFAULT_TEMPERATURE
)

Arguments

test_input

The input data frame/data table/tibble. This should contain one or more OT tableaux consisting of mappings between underlying and surface forms with observed frequency and violation profiles. Constraint violations must be numeric.

For an example of the data frame format, see inst/extdata/sample_data_frame.csv. You can read this file into a data frame using read.csv or into a tibble using dplyr::read_csv.

This function also supports the legacy OTSoft file format. You can use this format by passing in a file path string to the OTSoft file rather than a data frame.

For examples of OTSoft format, see inst/extdata/sample_data_file.txt.

constraint_weights

A vector of constraint weights to use. These are typically generated by the optimize_weights function.

output_path

(optional) A string specifying the path to a file to which the predictions will be saved. If the file exists it will be overwritten. If this argument isn't provided, the output will not be written to a file.

out_sep

(optional) The delimiter used in the output files. Defaults to commas.

encoding

(optional) The character encoding of the input file. Defaults to "unknown".

temperature

(optional) The temperature parameter, which should be a real number >= 1. Defaults to 1.

Details

For each input/output pair in the provided file this function will calculate the probability of that output given the input form and the provided weights. This probability is defined as

P(y|x; w) = \frac{1}{Z_w(x)}\exp(-\sum_{k=1}^{m}{w_k f_k(y, x)})

where f_k(y, x) is the number of violations of constraint k incurred by mapping underlying x to surface y, w_k is the weight associated with constraint k, and Z_w(x) is a normalization term defined as

Z_w(x) = \sum_{y\in\mathcal{Y}(x)}{\exp(-\sum_{k=1}^{m}{w_k f_k(y, x)})}

where \mathcal{Y}(x) is the set of all output candidates for input x.

The resulting probabilities will be appended to a data frame object representing the input tableaux. This data frame can also be saved to a file if the output_path argument is provided.

Value

An object with the following named attributes:

  • log_lik: the log likelihood of the data under the provided weights

  • predictions: A data table containing all the tableaux, with probabilities assigned to each candidate and errors.

Using temperature

If the temperature parameter T is specified, P(y|x; w) is calculated as

\frac{1}{Z_w(x)}\exp(-\sum_{k=1}^{m}{(w_k f_k(y, x)})/T)

and Z_w(x) is similarly calculated as

\sum_{y\in \mathcal{Y}(x)}{\exp(-\sum_{k=1}^{m}{(w_k f_k(y, x))/T})}

Larger values of T move the predicted probabilities of output candidates for a particular input towards equality with one another. For example, if a particular input has two candidate outputs, higher values of T will move the probability of each towards 0.5.

The temperature parameter can be used to generate less categorical predictions in a way that is independent of the constraint weights. See Ackley, Hinton, and Sejnowski (1985, p. 150-152) for more detail, and Hayes et al. (2009) and Mayer (2021, Ch. 4) for examples of temperature used in practice. By default this parameter is set to 1, which renders the equations in this section equivalent to the standard calculations of probability.

Examples

  # Get paths to toy data file
  df_file <- system.file(
      "extdata", "sample_data_frame.csv", package = "maxent.ot"
  )
  # Fit weights to dataframe with no biases
  tableaux_df <- read.csv(df_file)
  fit_model <- optimize_weights(tableaux_df)
  predict_probabilities(tableaux_df, fit_model$weights)

  # Do so with a temperature parameter
  predict_probabilities(tableaux_df, fit_model$weights, temperature = 2)

  # Save predictions to a file
  tmp_output <- tempfile()
  predict_probabilities(tableaux_df, fit_model$weights, output_path=tmp_output)

connormayer/maxent.ot documentation built on Nov. 24, 2024, 1:21 p.m.