optimize_weights | R Documentation |
Optimizes constraint weights given a data set and optional biases. If no bias arguments are provided, the bias term(s) will not be included in the optimization.
optimize_weights(
input,
bias_input = NA,
mu = NA,
sigma = NA,
control_params = NA,
upper_bound = DEFAULT_UPPER_BOUND,
encoding = "unknown",
model_name = NA,
allow_negative_weights = FALSE
)
input |
The input data frame/data table/tibble. This should contain one or more OT tableaux consisting of mappings between underlying and surface forms with observed frequency and violation profiles. Constraint violations must be numeric. For an example of the data frame format, see inst/extdata/sample_data_frame.csv. You can read this file into a data frame using read.csv or into a tibble using dplyr::read_csv. This function also supports the legacy OTSoft file format. You can use this format by passing in a file path string to the OTSoft file rather than a data frame. For examples of OTSoft format, see inst/extdata/sample_data_file.txt. |
bias_input |
(optional)
A data frame/data table/tibble containing the bias mus and sigmas. Each row
corresponds to an individual constraint, and consists of three columns:
For examples of OTSoft bias format, see inst/extdata/sample_bias_file_otsoft.txt. Each row in this file should be the name of the constraint, followed by the mu, followed by the sigma (separated by tabs). |
mu |
(optional) A scalar or vector that will serve as the mu for each
constraint in the bias term. Constraint weights will also be initialized to
this value. If a vector, its length must equal the number of constraints in
the input file. This value will not be used if |
sigma |
(optional) A scalar or vector that will serve as the sigma for
each constraint in the bias term. If a vector, its length must equal the
number of constraints in the input file. This value will not be used if
|
control_params |
(optional) A named list of control parameters that
will be passed to the optim function. See the documentation
of that function for details. Note that some parameter settings may
interfere with optimization. The parameter |
upper_bound |
(optional) The maximum value for constraint weights. Defaults to 100. |
encoding |
(optional) The character encoding of the input file. Defaults to "unknown". |
model_name |
(optional) A name for the model. If not provided, the name of the variable will be used if the input is a data frame. If the input is a path to an OTSoft file, the filename will be used. |
allow_negative_weights |
(optional) Whether the optimizer should allow negative weights. Defaults to FALSE. |
The objective function J(w)
that is optimized is defined as
J(w) = \sum_{i=1}^{n}{\ln P(y_i|x_i; w)}
- \sum_{k=1}^{m}{\frac{(w_k - \mu_k)^2}{2\sigma_k^2}}
The first term in this equation calculates the natural logarithm of the
conditional likelihood of the training data under the weights w
. n
is the number of data points (i.e., the sample size or the sum of the frequency
column in the input),x_i
is the input form of the i
th data
point, and y_i
is the observed surface form corresponding to
x_i
.P(y_i|x_i; w)
represents the probability of realizing
underlying x_i
as surface y_i
given weights w
. This
probability is defined as
P(y_i|x_i; w) = \frac{1}{Z_w(x_i)}\exp(-\sum_{k=1}^{m}{w_k f_k(y_i, x_i)})
where f_k(y_i, x_i)
is the number of violations of constraint k
incurred by mapping underlying x_i
to surface y_i
. Z_w(x_i)
is a normalization term defined as
Z(x_i) = \sum_{y\in\mathcal{Y}(x_i)}{\exp(-\sum_{k=1}^{m}{w_k f_k(y, x_i)})}
where \mathcal{Y}(x_i)
is the set of observed surface realizations of
input x_i
.
The second term of the equation for calculating the objective function is
the optional bias term, where w_k
is the weight of constraint k
, and
\mu_k
and \sigma_k
parameterize a normal distribution that
serves as a prior for the value of w_k
. \mu_k
specifies the mean
of this distribution (the expected weight of constraint k
before
seeing any data) and sigma_k
reflects certainty in this value: lower
values of \sigma_k
penalize deviations from \mu_k
more severely,
and thus require greater amounts of data to move w_k
away from
mu_k
. While increasing \sigma_k
will improve the fit to the
training data, it may result in overfitting, particularly for small data
sets.
A general bias with \mu_k = 0
for all k
is commonly used as a
form of simple regularization to prevent overfitting (see, e.g., Goldwater
and Johnson 2003). Bias terms have also been used to model proposed
phonological learning biases; see for example Wilson (2006), White (2013),
and Mayer (2021, Ch. 4). The choice of \sigma
depends on the sample
size. As the number of data points increases, \sigma
must decrease in
order for the effect of the bias to remain constant: specifically,
n\sigma^2
must be held constant, where n
is the number of tokens.
Optimization is done using the optim function from the R-core
statistics library. By default it uses L-BFGS-B
optimization, which is a
quasi-Newtonian method that allows upper and lower bounds on variables.
Constraint weights are restricted to finite, non-negative values.
If no bias parameters are specified (either the bias_file
argument or the
mu and sigma parameters), optimization will be done without the bias term.
An object with the following named attributes:
weights
: A named list of the optimal constraint weights
log_lik
: the log likelihood of the data under the discovered
weights
k
: the number of constraints
n
: the number of data points in the training set
# Get paths to toy data and bias files.
df_file <- system.file(
"extdata", "sample_data_frame.csv", package = "maxent.ot"
)
bias_file <- system.file(
"extdata", "sample_bias_data_frame.csv", package = "maxent.ot"
)
# Fit weights to data with no biases
tableaux_df <- read.csv(df_file)
optimize_weights(tableaux_df)
# Fit weights with biases specified in file
bias_df <- read.csv(bias_file)
optimize_weights(tableaux_df, bias_df)
# Fit weights with biases specified in vector form
optimize_weights(
tableaux_df, mu = c(1, 2), sigma = c(100, 200)
)
# Fit weights with biases specified as scalars
optimize_weights(tableaux_df, mu = 0, sigma = 1000)
# Fit weights with mix of scalar and vector biases
optimize_weights(tableaux_df, mu = c(1, 2), sigma = 1000)
# Pass additional arguments to optim function
optimize_weights(tableaux_df, control_params = list(maxit = 500))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.