Nothing
#' function for model estimation for EQ-5D valuesets
#'
#' @description Estimation of hybrid model for EQ-5D data
#'
#' @param formula linear model `formula`. Using `|x` will include a grouping variable `x`. see Details.
#' @param data a `data.frame` containing the data. see Details.
#' @param type either the name of the column in `data` containing an indicator of whether an
#' observation is continuous or dichotomous (as `character`), or a `vector` containing the indicator.
#' see Details.
#' @param type_cont value of `type` referring to continuous data. see Details.
#' @param type_dich Value of `type` referring to dichotomous data. see Details.
#' @param k `numeric`. Number of latent classes to be estimated via [flexmix::flexmix()].
#' @param control control list for [flexmix::flexmix()].
#' @param stv `named vector` or `list` of named vectors containing start values for all coefficients
#' formula, including sigma and theta, see Details
#' @param offset offset as in [flexmix::flexmix()]
#' @param optimizer `charachter`,optimizer to be used in [bbmle::mle2()], default `"optim"`
#' @param opt_method `charachter`, optimization method to be used in optimizer, default `"BFGS"`
#' @param lower `numeric`, lower bound for censored data, default `-INF`. If this is used, `opt_method` must be set to `"L-BFGS-B"`,
#' @param upper `numeric`, upper bound for censored data, default `INF`. If this is used, `opt_method` must be set to `"L-BFGS-B"`,
#' @param latent `charachter`,data type to use in component identification, must be one of `"both"`, `"cont"` or `"dich"`,
#' default `“both”`, see Details
#' @param id_col `character`, name of the grouping variable, only needed if `latent != "both”`, see Details
#' @param classes_only `logical`, default `FALSE`, indicates whether the function should perform only
#' classification, rather than both classification and model estimation, only possible for `latent != "both"`,
#' see Datails
#' @param variables_both `character vector`; variables to be fitted on both continuous and dichotomous data.
#' If not specified, all variables from `formula` are used. If provided and not all variables from `formula`
#' are included, `variables_cont` and `variables_dich` must be provided as well, while one of them can be `NULL`,
#' see Details.
#' @param variables_cont `character` vector; variables to be fitted only on continous data. If provided,
#' `variables_both` and `variables_dich` must be provided as well.
#' @param variables_dich `character` vector; variables to be fitted only on dichotomous data, if provided,
#' `variables_both` and `variables_cont` must be provided as well.
#' @param ... additional arguments for [flexmix::flexmix()] or [bbmle::mle2()]
#'
#' @return model object of type `flexmix` or `list` of model objects of type `flexmix`
#'
#' @details
#' see details of different inputs listed below
#'
#'@section formula:
#' a typical R formula of the form `y ~ x1 + x2 + …` should be provided.
#' Additionally, it is possible to include a grouping variable for repeated measures by using
#' `“| xg”` where `xg` is the column containing the group-memberships. The resulting formula will look
#' like this: `y ~ x1 + x2 +… | xg`. In `flexmix`, this is called the concomitant variable specification:
#' the model is fit conditional on grouping, so that all observations with the same group are treated
#' as belonging together when computing likelihood contributions. One possible grouping variable can be
#' an id number to identify answers by the same participants. We highly recommend using a grouping variable,
#' since otherwise the algorithm for k = 2 tends to classify all continuous data into one estimated class
#' and all dichotomous data into the other.
#'
#'
#' @section data:
#' a dataframe having the following columns: all independent variables (x)
#' and the dependent variable y used in `formula`, one column for the grouping variable xg if grouping
#' should be used, e.g. id numbers of participants with repeated measurements, one column indicating
#' if the observations belongs to continuous or dichotomous data with the entries `type_cont`
#' and `type_dich` (e.g., for a column called `"type"` with the entries "TTO" for continuous datapoints
#' and "DCE" for dichotomous datapoints, `type_cont` will be "TTO" and `type_dich` will be "DCE").
#' One row should match one observation (one datapoint).
#'
#'
#' @section start values (stv):
#' if the same start values `stv` are to be used for all latent classes,
#' the given start values must be a `named vector`. Otherwise (if different start values are assumed for
#' each latent class), a `list` of named vectors should be used . In this case, there must be one entry
#' in the list for each latent class. Each start value vector must include start values for sigma and
#' theta. Currently, it is necessary to use the names `"sigma"` and `"theta"` for these values.
#' If users are unsure for which variables start values must be provided, this can be checked by
#' calling `colnames(model.matrix(formula,data))`. In this call, the `formula` should not include the
#' grouping variable.
#'
#'
#' @section latent, id_col, classes_only:
#' in some situations, it can be useful to identify the latent classes on
#' only one `type` of data while estimating the model parameters on both `types` of data. In such cases,
#' the input variable `latent` can be used to specify on which type of data the classification should be done.
#' If `“cont”` or `“dich”` is used, the input parameter `id_col` must be specified and gives the name,
#' i.e. a `character string`, of the grouping variable for classification. Some groups may be removed from
#' the data, since they have only continuous or only dichotomous observations. Then in a first step,
#' a model is estimated only on the continuous/dichotomous data and the achieved classification is stored.
#' In a next step, model parameters are estimated separately for each identified class on both `types` of data
#' using this classification. The output object of `hyreg2` in this case is a `list` of k models.
#' Additionally, at position k+1 of the list, a data frame containing the corresponding classifications
#' from the first step is returned. Each element k in the list contains the estimated parameters for one
#' of the latent classes. When setting the input variable `classes_only` to `TRUE`, the second step is left
#' out and the estimated classes from step one are given as output.
#'
#'
#' @section variables_both, variables_cont, variables_dich:
#' It is possible to specify partial coefficients,
#' which are used only on continuous or dichotomous data.
#'* Example: Suppose different models should be specified for continuous and dichotomous data:
#'* Model continuous data: `y ~ x1 + x3`
#'* Model dichotomous data: `y ~ x1 + x2`
#'* The `formula` input to `hyreg2` must then include all parameters that occur in either model:
#' `y ~ x1 + x2 + x3`
#'* The assignment of parameters to data types is then achieved via the input arguments `variables_both`,
#' `variables_cont`, and `variables_dich`:
#'* `variables_both` = `“x1”`,
#'* `variables_cont` = `“x3”` and
#'* `variables_dich` = `“x2”`.
#'* Every variable included in the provided `formula` (except the grouping variable ) must appear in exactly
#' one of these vectors. One of the `variables_` vectors can also be `NULL`, if no variables should be used only on this type of the data.
#' @author Svenja Elkenkamp, Kim Rand and John Grosser
#' @examples
#'
#'formula <- y ~ -1 + x1 + x2 + x3 | id
#'
#'k <- 2
#'stv <- setNames(c(0.2,0,1,1,1),c(colnames(simulated_data_norm)[3:5],c("sigma","theta")))
#'control = list(iter.max = 1000, verbose = 4)
#'rm(counter)
#'mod <- hyreg2(formula = formula,
#' data = simulated_data_norm,
#' type = simulated_data_norm$type, # also "type" would work
#' stv = stv,
#' k = k,
#' type_cont = "TTO",
#' type_dich = "DCE_A",
#' opt_method = "L-BFGS-B",
#' control = control,
#' latent = "cont",
#' id_col = "id"
#')
#'summary_hyreg2(mod)
#' @importFrom flexmix flexmix
#' @importFrom bbmle mle2
#' @export
hyreg2 <-function(formula,
data,
type,
type_cont,
type_dich,
k = 1,
control = NULL,
stv = NULL, # has to include starting values for sigma and theta as well
offset = NULL,
opt_method = "BFGS",
optimizer = "optim",
lower = -Inf,
upper = Inf,
latent = "both", # one of "both", "cont", "dich"
id_col = NULL, # as character
classes_only = FALSE,
variables_both = NULL,
variables_dich = NULL,
variables_cont = NULL,
# non_linear = FALSE,
# additional arguments for flexmix or optimizer ?
# MISSING:
# non linear regression not implemented yet
...){
dotarg <- list(...)
if(is.character(type) & length(type) == 1 ){
type <- data[,type]
}
# assign k to k in package environment to be used during M-step driver
assign("k", k, envir=the)
# prepare formula handling
formula_string <- paste(deparse(formula), collapse = "")
formula_parts <- strsplit(formula_string, "\\|")[[1]]
formula_short <- as.formula(formula_parts[1])
### STV Check ###
# check stv for names in x (model.matrix(formula,data))
# no stv
if(!is.list(stv)){
if(is.null(stv)){
warning(paste0("Argument stv not provided. Setting all start values from formula to 0.1 and start values for sigma and theta to 1."))
stv <- setNames(c(rep(0.1,dim(model.matrix(formula_short,data))[2]),1,1), c(colnames(model.matrix(formula_short,data)),c("sigma","theta")))
}else{
# one or more stv missing
if(any(!is.element(colnames(model.matrix(formula_short,data)),names(stv)))){
miss <- colnames(model.matrix(formula_short,data))[!is.element(colnames(model.matrix(formula_short,data)),names(stv))]
stop(paste0("Start value(s) missing for ", paste(miss, collapse = ", ") ,". Please provide start values for all relevant formula variables."
))
}
# theta missing
if(!is.element("theta",names(stv))){
stv <- c(stv,setNames(1,"theta"))
warning(paste0("Start value missing for theta, setting to 1."))
}
# sigma missing
if(!is.element("sigma",names(stv))){
stv <- c(stv,setNames(1,"sigma"))
warning(paste0("Start value missing for sigma, setting to 1."))
}
# stv for variables not in formula given, d.h. zu viele angegeben
if(any(!is.element(names(stv), c(colnames(model.matrix(formula_short,data)),"theta","sigma")))){
much <- names(stv)[!is.element(names(stv), c(colnames(model.matrix(formula_short,data)),"theta","sigma"))]
stop(paste0("Start values provided for variables not in formula: ",paste(much, collapse = ", ")))
}
# check order in FLXMRhyreg
}
}else{ # stv is a list
# warning(paste0("stv is a list. Please ensure that variables_both, variables_dich and variables_cont are provided"))
for(i in 1:length(stv)){
# one or more stv missing
if(any(!is.element(colnames(model.matrix(formula_short,data)),names(stv[[i]])))){
miss <- colnames(model.matrix(formula_short,data))[!is.element(colnames(model.matrix(formula_short,data)),names(stv[[i]]))]
stop(paste0("Start value(s) missing for ", paste(miss, collapse = ", ") ,". Please provide start values for all relevant formula variables."
))
}
# theta missing
if(!is.element("theta",names(stv[[i]]))){
stv[[i]] <- c(stv[[i]],setNames(1,"theta"))
warning(paste0("Start value missing for theta, setting to 1."))
}
# sigma missing
if(!is.element("sigma",names(stv[[i]]))){
stv[[i]] <- c(stv[[i]],setNames(1,"sigma"))
warning(paste0("Start value missing for sigma, setting to 1."))
}
# stv for variables not in formula given, d.h. zu viele angegeben
if(any(!is.element(names(stv[[i]]), c(colnames(model.matrix(formula_short,data)),"theta","sigma")))){
much <- names(stv[[i]])[!is.element(names(stv[[i]]), c(colnames(model.matrix(formula_short,data)),"theta","sigma"))]
stop(paste0("Start values provided for variables not in formula: ",paste(much, collapse = ", ")))
}
}
}
### TYPE Check ###
if(is.null(type) | is.null(type_dich) | is.null(type_cont)){
stop(paste0("Argument(s) type, type_dich and/or typ_cont not provided."))
}else{
if(!is.element(type_dich,unique(type))){
warning(paste0("Invalid type_dich: provided name not present in values of type"))
}
if(!is.element(type_cont,unique(type))){
warning(paste0("Invalid type_cont: provided name not present in values of type"))
}
}
### VARIABALES Check ###
if(!is.list(stv)){
if(is.null(variables_both) & is.null(variables_dich) & is.null(variables_cont)){
variables_both <- names(stv)[!is.element(names(stv),c("sigma","theta"))]
}else{
if(any(!is.element(names(stv)[!is.element(names(stv),c("sigma","theta"))],
c(variables_both,variables_dich,variables_cont)))){
# check if all variables are included
stop(paste0("All variables named in stv must be contained in exactly one of: (1) variables_both, (2) variables_dich or (3) variables_cont."))
# alternative: do not provide any of the vectors
# than all relevant variables are set to variables_both automatically
}
if(any(table(c(variables_both,variables_cont,variables_dich))>1)){
stop(paste0("All variables named in stv must be contained in exactly one of: (1) variables_both, (2) variables_dich or (3) variables_cont."))
}
}
}else{
# if stv is a list, both classes have to depend on the same variable set,
# stv of different classes have same names (hence using stv[[1]] is okay here)
# different set of variables for each class not supported yet!
if(is.null(variables_both) & is.null(variables_dich) & is.null(variables_cont)){
variables_both <- names(stv[[1]])[!is.element(names(stv[[1]]),c("sigma","theta"))]
}else{
if(any(!is.element(names(stv[[1]])[!is.element(names(stv[[1]]),c("sigma","theta"))],
c(variables_both,variables_dich,variables_cont)))){
# check if all variables are included
stop(paste0("All variables named in stv must be contained in exactly one of: (1) variables_both, (2) variables_dich or (3) variables_cont."))
# alternative: do not provide any of the vectors
# than all relevant variables are set to variables_both automatically
}
if(any(table(c(variables_both,variables_cont,variables_dich))>1)){
stop(paste0("All variables named in stv must be contained in exactly one of: (1) variables_both, (2) variables_dich or (3) variables_cont."))
}
}
}
### NON LINEAR FUNCTIONS ###
formula_orig <- formula
# if(non_linear == TRUE){
# formula <- function to keep only names of data columns
# }
# for linear functoins formula and formula_orig are the same
### ESTIMATION ###
if(latent == "both"){
model <- list(FLXMRhyreg(type= type,
stv = stv,
type_cont = type_cont,
type_dich = type_dich,
variables_both = variables_both,
variables_cont = variables_cont,
variables_dich = variables_dich,
opt_method = opt_method,
optimizer = optimizer,
lower = lower,
upper = upper))
#non_linear = non_linear))
#formula_orig = formula
fit <- flexmix::flexmix(formula = formula, data = data, k = k, model = model, control = control)
rm(counter, envir = the) # counter will be created during the M-step driver
return(fit)
}else{
# latent = "cont" or "dich"
# Prepare first step
if(is.null(id_col)){
stop("id_col needed")
}
idframe <- data.frame(id = data[,id_col],type)
idcount <- as.data.frame(table(unique(idframe)))
#as.character(idcount[idcount$Freq == 0,"id"])
data <- data[!is.element(as.character(data[,id_col]), as.character(idcount[idcount$Freq == 0,"id"])),]
type <- idframe[!is.element(as.character(idframe[,"id"]), as.character(idcount[idcount$Freq == 0,"id"])),"type"]
if(any(idcount$Freq == 0)){
miss <- idcount[idcount$Freq == 0,"id"]
warning(paste0(id_col ,paste(miss, collapse = ", "), " were removed, since they had only continuous or only dichotomous observations."))
}
# FIRST STEP: GET LATENT CLASSES
if(latent == "cont"){
data_cont <- data[type == type_cont,]
model <- list(FLXMRhyreg(type= type[type == type_cont],
stv = stv,
type_cont = type_cont,
type_dich = type_dich,
variables_both = variables_both,
variables_cont = variables_cont,
variables_dich = variables_dich,
opt_method = opt_method,
optimizer = optimizer,
lower = lower,
upper = upper,
# non_linear = non_linear,
formula_orig = formula_orig))
mod <- flexmix::flexmix(formula = formula, data = data_cont, k = k, model = model, control = control)
rm(counter, envir = the)
# id_col must be something different than id if each observation should be classified
data_cont$mod_comp <- mod@cluster
data$roworder <- 1:nrow(data)
data <- merge(data, unique(data_cont[,c(id_col,"mod_comp")]), by = id_col)
data <- data[order(data$roworder), ]
id_classes <- data_cont[,c(id_col,"mod_comp")]
}
if(latent == "dich"){
data_dich <- data[type == type_dich,]
model <- list(FLXMRhyreg(type= type[type == type_dich],
stv = stv,
type_cont = type_cont,
type_dich = type_dich,
variables_both = variables_both,
variables_cont = variables_cont,
variables_dich = variables_dich,
opt_method = opt_method,
optimizer = optimizer,
lower = lower,
upper = upper,
# non_linear = non_linear,
formula_orig = formula_orig))
mod <- flexmix::flexmix(formula = formula, data = data_dich, k = k, model = model, control = control)
rm(counter, envir = the) # counter will be created during the M-step driver
data_dich$mod_comp <- mod@cluster
data$roworder <- 1:nrow(data)
data <- merge(data, unique(data_dich[,c(id_col,"mod_comp")]), by = id_col)
data <- data[order(data$roworder), ]
# später auch ausgeben können, welche ID zu welcher Klasse zugeordnet wurde
id_classes <- data_dich[,c(id_col,"mod_comp")]
}
# return only estimated classes without model new model coefficients
if(classes_only == TRUE){
return(id_classes)
}
# SECOND STEP: GET MODEL ESTIMATES
# assign 1 to k in package environment to be used during M-step driver
assign("k", 1, envir=the)
data_list <- list()
for(i in unique(mod@cluster)){
data_list[[i]] <- data[data$mod_comp == i,]
}
mod_list <- lapply(data_list, function(xy){
if(is.null(xy)){
mod <- NULL
warning( paste("One or more components are empty. Setting mod to NULL."))
}else{
model <- list(FLXMRhyreg(type= type[data$mod_comp == unique(xy$mod_comp)],
#type = type,
stv = stv, # stv can be a list
type_cont = type_cont,
type_dich = type_dich,
variables_both = variables_both,
variables_cont = variables_cont,
variables_dich = variables_dich,
opt_method = opt_method,
optimizer = optimizer,
lower = lower,
upper = upper,
# non_linear = non_linear,
formula_orig = formula_orig))
mod <- flexmix::flexmix(formula = formula, data = xy, k = 1, model = model, control = control)
rm(counter, envir = the) # counter will be created during the M-step driver
}
return(mod)
})
mod_list$id_classes <- unique(id_classes)
return(mod_list)
}
rm(k, envir = the)
}
### new summary function ###
#' construct model summary as in base R
#'
#' @description get model parameters of model generated by `hyreg2` oder `hyreg2_het`
#'
#' @param object `modelobject` generated with [hyreg2()] or [hyreg2_het()]
#' @return `summary` object of [bbmle::mle2()] model
#'
#' @author Svenja Elkenkamp
#' @examples
#'
#'formula <- y ~ -1 + x1 + x2 + x3 | id
#'k <- 1
#'stv <- setNames(c(0.2,0,1,1,1),c(colnames(simulated_data_norm)[3:5],c("sigma","theta")))
#'control = list(iter.max = 1000, verbose = 4)
#'rm(counter)
#'mod <- hyreg2(formula = formula,
#' data = simulated_data_norm,
#' type = simulated_data_norm$type,
#' stv = stv,
#' k = k,
#' type_cont = "TTO",
#' type_dich = "DCE_A",
#' opt_method = "L-BFGS-B",
#' control = control,
#' latent = "both",
#' id_col = "id"
#')
#'summary_hyreg2(mod)
#'
#' @importFrom bbmle mle2
#' @export
summary_hyreg2 <- function(object){
if(is.list(object)){
object <- object[1:(length(object) - 1)]
out <- lapply(object, function(k){
comp <- k@components
out_list <- lapply(comp, function(j){bbmle::summary(j[[1]]@parameters[["fit_mle"]])})
})
}else{
comp <- object@components
out <- lapply(comp, function(j){bbmle::summary(j[[1]]@parameters[["fit_mle"]])})
}
return(out)
}
##############################
#' extract parameter estimates as named vector
#'
#' @description function to export coefficient values and names from a `model` fitted with `hyreg2` or `hyreg2_het`
#' These values can be used as `stv` for a new model with `k > 1`
#' @param mod `model`output from [hyreg2] oder [hyreg2_het]. If `latent` was `"cont"` or `"dich"` only one element of the output list can be used.
#' @param comp `charachter`, default `"Comp.1"`. `"Comp.x"` indicating values from which model `component` (`x`) should be exported
#' @return `named vector` of parameter estimates from `mod`. Can be used as `stv` for additional model estimations using
#' `hyreg2` or `hyreg2_het`
#'
#' @author Svenja Elkenkamp
#' @examples
#'formula <- y ~ -1 + x1 + x2 + x3 | id
#'
#'k <- 1
#'stv <- setNames(c(0.2,0,1,1,1),c(colnames(simulated_data_norm)[3:5],c("sigma","theta")))
#'control = list(iter.max = 1000, verbose = 4)
#'rm(counter)
#'mod <- hyreg2(formula = formula,
#' data = simulated_data_norm,
#' type = simulated_data_norm$type,
#' stv = stv,
#' k = k,
#' type_cont = "TTO",
#' type_dich = "DCE_A",
#' opt_method = "L-BFGS-B",
#' control = control,
#' latent = "both",
#' id_col = "id"
#')
#'new_stv <- get_stv(mod)
#'
#' # these new_stv can be used in an other estimation using hyreg2 as stv
#' @export
get_stv <- function(mod, comp = "Comp.1"){
stv <- c(mod@components[[comp]][[1]]@parameters$coef,
mod@components[[comp]][[1]]@parameters$sigma,
mod@components[[comp]][[1]]@parameters$theta)
return(stv)
}
##################################
# for use of fit_mle
# internal function, not for direct user use
gendf <- function(list){
df <- list()
for(j in 1:length(list)){
df[[j]] <- data.frame("Estimates" = c(list[[j]]@parameters[["coef"]], # maybe include an other apply function here?
list[[j]]@parameters[["sigma"]],
list[[j]]@parameters[["theta"]]),
"Std Error" = bbmle::summary(list[[j]]@parameters[["fit_mle"]])@coef[,2],
"pvalue" =bbmle::summary(list[[j]]@parameters[["fit_mle"]])@coef[,4])
# give AIC, loglik here as additional arguments?
}
return(df)
}
### refit function ###
# get parametervalues of models
# object is outcome of hyreg2
refit <- function(object){
if(is.list(object)){
out <- lapply(object, function(k){
comp <- k@components
out_list <- lapply(comp, function(j){gendf(j)})
})
}else{
comp <- object@components
out <- lapply(comp, function(j){gendf(j)})
}
return(out)
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.