CompoundAuxFit | R Documentation |
Calculates the log-likelihood, the score functions of the log-likelihood and fits the compound distribution based on the GB2 and using auxiliary information.
pkl.cavgb2(z, lambda) lambda0.cavgb2(pl0, z, w=rep(1, dim(z)[1])) logl.cavgb2(fac, z, lambda, w=rep(1, dim(fac)[1])) scores.cavgb2(fac, z, lambda, w=rep(1, dim(fac)[1])) ml.cavgb2(fac, z, lambda0, w = rep(1, dim(fac)[1]), maxiter = 100, fnscale=length(w))
z |
numeric; a matrix of auxiliary variables. |
lambda |
numeric; a matrix of parameters. |
pl0 |
numeric; a vector of initial proportions defining the number of components and the weight of each component density in the decomposition. Sums to one. |
w |
numeric; vector of weights of length the number of rows of the matrix |
fac |
numeric; a matrix of Gamma factors. |
lambda0 |
numeric; a matrix of initial parameters. |
maxiter |
numeric; maximum number of iterations to perform. By default |
fnscale |
numeric; parameter of the |
We model the probabilities p_\ell with auxiliary variables. Let z_k denote the vector of auxiliary information for unit k. This auxiliary information modifies the probabilities p_\ell at the unit level. Denote by p_{k,\ell} the weight of the density f_\ell for unit k. For \ell=1,...,L-1, we pose a linear model for the log-ratio v_{k,\ell}:
\log(p_{k,\ell}/p_{k,L}) = v_{k,\ell} =∑_{i=1}^I λ_{\ell i} z_{k i}= \mathbf{z}_k^{T} \boldsymbol{λ_{\ell}}.
Function pkl.cavgb2
calculates the p_{k,\ell}. Function lambda0.cavgb2
calculates the initial values λ_{\ell i}, i= 1, ..., I, \ell = 1, ..., L-1 . Let
\bar{z}_{i}=∑_k w_k z_{ki}/∑_k w_k
be the mean value of the i-th explanatory variable. Writing
\log(\hat{p}^{(0)}_\ell / \hat{p}^{(0)}_L)=v^{(0)}_\ell = ∑_{i=1}^I λ^{(0)}_{\ell i} \bar{z}_{i},
we can choose λ^{(0)}_{\ell i}= v^{(0)}_\ell / (I \bar{z}_{i}). Analogically to the ordinary fit of the compound distribution based on the GB2 CompoundFit
, we express the log-likelihood as a weighted mean of log(f) = log(∑(p_{k,\ell} f_\ell(x_k)), evaluated at the data points, where f is the GB2 compound density.
The scores are obtained as the weighted sums of the first derivatives of the log-likelihood, with respect to the parameters λ_\ell, \ \ell=1, ..., L-1, evaluated at the data points.
Function ml.cavgb2
performs maximum likelihood estimation through the general-purpose optimization function optim
from package stats
.
The considered method of optimization is "BFGS" (optim
). Once we have the fitted parameters \hat{λ} we can deduce the fitted parameters \hat{v{k\ell}} and
\hat{p_{k\ell}} in function of \bar{z} and \hat{λ}_{\ell}.
pkl.cavgb2
returns a matrix of probabilities. lambda0.cavgb2
returns a matrix of size I \times L-1.
logl.cavgb2
returns the value of the pseudo log-likelihood.
scores.cavgb2
returns the weighted sum of the scores of the log-likelihood.
ml.cavgb2
returns a list containing two objects - the vector of fitted coefficients \hat{λ_\ell} and the output of the "BFGS" fit.
Monique Graf and Desislava Nedyalkova
optim
## Not run: library(simFrame) data(eusilcP) # Stratified cluster sampling set.seed(12345) srss <- SampleControl(design = "region", grouping = "hid", size = c(200*3, 1095*3, 1390*3, 425*3, 820*3, 990*3, 400*3, 450*3, 230*3), k = 1) # Draw a sample s1 <- draw(eusilcP,srss) #names(s1) # Creation of auxiliary variables ind <- order(s1[["hid"]]) ss1 <- data.frame(hid=s1[["hid"]], region=s1[["region"]], hsize=s1[["hsize"]], peqInc=s1[["eqIncome"]], age=s1[["age"]], pw=s1[[".weight"]])[ind,] ss1[["child"]] <- as.numeric((ss1[["age"]]<=14)) ss1[["adult"]] <- as.numeric((ss1[["age"]]>=20)) sa <- aggregate(ss1[,c("child","adult")],list(ss1[["hid"]]),sum) names(sa)[1] <- "hid" sa[["children"]] <- as.numeric((sa[["child"]]>0)) sa[["single_a"]] <- as.numeric((sa[["adult"]]==1)) sa[["sa.ch"]] <- sa[["single_a"]]*sa[["children"]] sa[["ma.ch"]] <- (1-sa[["single_a"]])*sa[["children"]] sa[["nochild"]] <- 1-sa[["children"]] # New data set ns <- merge(ss1[,c("hid","region","hsize","peqInc","pw")], sa[,c("hid","nochild","sa.ch","ma.ch")], by="hid") # Ordering the data set ns <- ns[!is.na(ns$peqInc),] index <- order(ns$peqInc) ns <- ns[index,] # Truncate at 0 ns <- ns[ns$peqInc>0,] # income peqInc <- ns$peqInc # weights pw <- ns$pw # Adding the weight adjustment c1 <- 0.1 pwa <- robwts(peqInc,pw,c1,0.001)[[1]] corr <- mean(pw)/mean(pwa) pwa <- pwa*corr ns <- data.frame(ns, aw=pwa) # Empirical indicators with original weights emp.ind <- c(main.emp(peqInc, pw), main.emp(peqInc[ns[["nochild"]]==1], pw[ns[["nochild"]]==1]), main.emp(peqInc[ns[["sa.ch"]]==1], pw[ns[["sa.ch"]]==1]), main.emp(peqInc[ns[["ma.ch"]]==1], pw[ns[["ma.ch"]]==1])) # Matrix of auxiliary variables z <- ns[,c("nochild","sa.ch","ma.ch")] #unique(z) z <- as.matrix(z) # global GB2 fit, ML profile log-likelihood gl.fit <- profml.gb2(peqInc,pwa)$opt1 agl.fit <- gl.fit$par[1] bgl.fit <- gl.fit$par[2] pgl.fit <- prof.gb2(peqInc,agl.fit,bgl.fit,pwa)[3] qgl.fit <- prof.gb2(peqInc,agl.fit,bgl.fit,pwa)[4] # Likelihood and convergence proflikgl <- -gl.fit$value convgl <- gl.fit$convergence # Fitted GB2 parameters and indicators profgb2.par <- c(agl.fit, bgl.fit, pgl.fit, qgl.fit) profgb2.ind <- main.gb2(0.6, agl.fit, bgl.fit, pgl.fit, qgl.fit) # Initial lambda and pl pl0 <- c(0.2,0.6,0.2) lambda0 <- lambda0.cavgb2(pl0, z, pwa) # left decomposition decomp <- "l" facgl <- fg.cgb2(peqInc, agl.fit, bgl.fit, pgl.fit, qgl.fit, pl0 ,decomp) fitcml <- ml.cavgb2(facgl, z, lambda0, pwa, maxiter=500) fitcml convcl <- fitcml[[2]]$convergence convcl lambdafitl <- fitcml[[1]] pglfitl <- pkl.cavgb2(diag(rep(1,3),lambdafitl) row.names(pglfitl) <- colnames(z) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.