monte_carlo_weights: Create simulated data and learn weights for these data

View source: R/monte_carlo.R

monte_carlo_weightsR Documentation

Create simulated data and learn weights for these data

Description

Creates a simulated data set by picking an output for each instance of an input. The probability of picking a particular output is guided by its conditional probability given the input. Learns constraint weights for each simulated data set.

Usage

monte_carlo_weights(
  pred_prob,
  num_simul,
  bias_file = NA,
  mu = NA,
  sigma = NA,
  output_path = NA,
  out_sep = ",",
  control_params = NA,
  upper_bound = DEFAULT_UPPER_BOUND,
  allow_negative_weights = FALSE
)

Arguments

pred_prob

A data frame with a column for predicted probabilities. This object should be in the same format as the predictions attribute of the object returned by the predict_probabilities function.

num_simul

The number of simulations to run.

bias_file

(optional) The path to the file containing mus and sigma for constraint biases. If this argument is provided, the scalar and vector mu and sigma arguments will be ignored. Each row in this file should be the name of the constraint, followed by the mu, followed by the sigma (separated by whatever the relevant separator is; default is commas).

mu

(optional) A scalar or vector that will serve as the mu for each constraint in the bias term. Constraint weights will also be initialized to this value. If a vector, its length must equal the number of constraints in the input file. This value will not be used if bias_file is provided.

sigma

(optional) A scalar or vector that will serve as the sigma for each constraint in the bias term. If a vector, its length must equal the number of constraints in the input file. This value will not be used if bias_file is provided.

output_path

(optional) A string specifying the path to a file to which the output will be saved. If the file exists it will be overwritten. If this argument isn't provided, the output will not be written to a file.

out_sep

(optional) The delimiter used in the output files. Defaults to commas.

control_params

(optional) A named list of control parameters that will be passed to the optim function. See the documentation of that function for details. Note that some parameter settings may interfere with optimization. The parameter fnscale will be overwritten with -1 if specified, since this must be treated as a maximization problem.

upper_bound

(optional) The maximum value for constraint weights. Defaults to 100.

allow_negative_weights

(optional) Whether the optimizer should allow negative weights. Defaults to FALSE.

Details

This function creates multiple simulated data sets, and learns a set of weights that maximizes the likelihood of data for each simulated data set.

To create a simulated data set, one output is randomly chosen for each instance of an input. The probability of picking a particular output, O_i, which arises from input I_j depends on Pr(O_i|I_j).

The function optimize_weights() is called to find a set of weights that maximize the likelihood of the simulated data. All optional arguments of optimize_weights() that were available for the user to specify biases and bounds are likewise available in this function, monte_carlo_weights().

The process of simulating a data set and learning weights that optimize the likelihood of the simulated data is repeated as per the number of specified simulations.

Value

A data frame with the following structure:

  • rows: As many rows as the number of simulations

  • columns: As many columns as the number of constraints

Why use this function?

This function gives us a way to estimate constraint weights via a Monte Carlo process. For example we might be interested in the effect of temperature on polarizing predicted probabilities, and the resulting constraint weights. This function can produce a distribution of constraint weights for the simulated polarized data, as well as a distribution of constraint weights for the simulated non-polarized ones, thereby allowing a comparison of the two.

Examples

  # Get paths to toy data file
  data_file <- system.file(
      "extdata", "sample_data_frame.csv", package = "maxent.ot"
  )

  tableaux_df <- read.csv(data_file)

  # Fit weights to data with no biases
  fit_model <- optimize_weights(tableaux_df)

  # Predict probabilities for the same input with temperature = 2
  pred_obj <- predict_probabilities(
      tableaux_df, fit_model$weights, temperature = 2
  )

 # Run 5 monte carlo simulations
 # based on predicted probabilities when temperature = 2,
 # and learn weights for these 5 simulated data sets
 monte_carlo_weights(pred_obj$predictions, 5)

 # Save learned weights to a file
 tmp_output <- tempfile()
 monte_carlo_weights(pred_obj$predictions, 5, output_path=tmp_output)

connormayer/maxent.ot documentation built on Nov. 24, 2024, 1:21 p.m.