D_regularized: Multivariate group difference estimation with regularized...
In multid: Multivariate Difference Between Two Groups

D_regularized

R Documentation

Multivariate group difference estimation with regularized binomial regression

Description

Multivariate group difference estimation with regularized binomial regression

Usage

D_regularized(
  data,
  mv.vars,
  group.var,
  group.values,
  alpha = 0.5,
  nfolds = 10,
  s = "lambda.min",
  type.measure = "deviance",
  rename.output = TRUE,
  out = FALSE,
  size = NULL,
  fold = FALSE,
  fold.var = NULL,
  pcc = FALSE,
  auc = FALSE,
  pred.prob = FALSE,
  prob.cutoffs = seq(0, 1, 0.2),
  append.data = FALSE
)

Arguments

`data`	A data frame or list containing two data frames (regularization and estimation data, in that order).
`mv.vars`	Character vector. Variable names in the multivariate variable set.
`group.var`	The name of the group variable.
`group.values`	Vector of length 2, group values (e.g. c("male", "female") or c(0,1)).
`alpha`	Alpha-value for penalizing function ranging from 0 to 1: 0 = ridge regression, 1 = lasso, 0.5 = elastic net (default).
`nfolds`	Number of folds used for obtaining lambda (range from 3 to n-1, default 10).
`s`	Which lambda value is used for predicted values? Either "lambda.min" (default) or "lambda.1se".
`type.measure`	Which measure is used during cross-validation. Default "deviance".
`rename.output`	Logical. Should the output values be renamed according to the group.values? Default TRUE.
`out`	Logical. Should results and predictions be calculated on out-of-bag data set? (Default FALSE)
`size`	Integer. Number of cases in regularization data per each group. Default 1/4 of cases.
`fold`	Logical. Is regularization applied across sample folds with separate predictions for each fold? (Default FALSE, see details)
`fold.var`	Character string. Name of the fold variable. (default NULL)
`pcc`	Logical. Include probabilities of correct classification? Default FALSE.
`auc`	Logical. Include area under the receiver operating characteristics? Default FALSE.
`pred.prob`	Logical. Include table of predicted probabilities? Default FALSE.
`prob.cutoffs`	Vector. Cutoffs for table of predicted probabilities. Default seq(0,1,0.20).
`append.data`	Logical. If TRUE, the data is appended to the predicted variables.

Details

fold = TRUE will apply manually defined data folds (supplied with fold.var) for regularization and obtain estimates for each separately. This can be a good solution, for example, when the data are clustered within countries. In such case, the cross-validation procedure is applied across countries.

out = TRUE will use separate data partition for regularization and estimation. That is, the first cross-validation procedure is applied within the regularization set and the weights obtained are then used in the estimation data partition. The size of regularization set is defined with size. When used with fold = TRUE, size means size within a fold."

For more details on these options, please refer to the vignette and README of the multid package.

Value

`D`	Multivariate descriptive statistics and differences.
`pred.dat`	A data.frame with predicted values.
`cv.mod`	Regularized regression model from cv.glmnet.
`P.table`	Table of predicted probabilities by cutoffs.

References

Lönnqvist, J. E., & Ilmarinen, V. J. (2021). Using a continuous measure of genderedness to assess sex differences in the attitudes of the political elite. Political Behavior, 43, 1779–1800. \Sexpr[results=rd]{tools:::Rd_expr_doi("https://doi.org/10.1007/s11109-021-09681-2")}

Ilmarinen, V. J., Vainikainen, M. P., & Lönnqvist, J. E. (2023). Is there a g-factor of genderedness? Using a continuous measure of genderedness to assess sex differences in personality, values, cognitive ability, school grades, and educational track. European Journal of Personality, 37, 313-337. \Sexpr[results=rd]{tools:::Rd_expr_doi("https://doi.org/10.1177/08902070221088155")}

Examples

D_regularized(
  data = iris[iris$Species == "setosa" | iris$Species == "versicolor", ],
  mv.vars = c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"),
  group.var = "Species", group.values = c("setosa", "versicolor")
)$D

# out-of-bag predictions
D_regularized(
  data = iris[iris$Species == "setosa" | iris$Species == "versicolor", ],
  mv.vars = c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"),
  group.var = "Species", group.values = c("setosa", "versicolor"),
  out = TRUE, size = 15, pcc = TRUE, auc = TRUE
)$D

# separate sample folds
# generate data for 10 groups
set.seed(34246)
n1 <- 100
n2 <- 10
d <-
  data.frame(
    sex = sample(c("male", "female"), n1 * n2, replace = TRUE),
    fold = sample(x = LETTERS[1:n2], size = n1 * n2, replace = TRUE),
    x1 = rnorm(n1 * n2),
    x2 = rnorm(n1 * n2),
    x3 = rnorm(n1 * n2)
  )

# Fit and predict with same data
D_regularized(
  data = d,
  mv.vars = c("x1", "x2", "x3"),
  group.var = "sex",
  group.values = c("female", "male"),
  fold.var = "fold",
  fold = TRUE,
  rename.output = TRUE
)$D

# Out-of-bag data for each fold
D_regularized(
  data = d,
  mv.vars = c("x1", "x2", "x3"),
  group.var = "sex",
  group.values = c("female", "male"),
  fold.var = "fold",
  size = 17,
  out = TRUE,
  fold = TRUE,
  rename.output = TRUE
)$D

multid documentation built on Sept. 11, 2025, 1:08 a.m.