Description Usage Arguments Format Value References See Also Examples
The function performs iterations between an EM algorithm for a mixture of two-stage logistic regressions with fixed candidate knots and knot movements. The function generates candidate knots for each splined variable. Three sub-functions (gen.init
,twostglogitregmixEM
,move.knot
) are used within the function.
1 2 3 4 |
data |
Raw data for StagedChoiceSplineMix
See Format(below) for details |
M |
number of iterations (def: 100) |
sp.cols |
vector of column numbers of splined variables in a data set (if sp.col is 0, |
num.knots |
vector of numbers of knot candidates for splined variables. (def: a vector all of whose entries are "19") |
sp.knots |
list of knot configurations. For each splined variable, a knot configuration is a k by 4 matrix whose rows represent latent classes and columns represent knots [browsing knot 1, browsing knot 2, writing knot 1, writing knot 2]. (def: approximately 1/3 and 2/3 of knot candidates for knot 1 and knot 2 respectively) |
betab |
matrix of starting points for betab (browsing parameters). If not given, |
betaw |
matrix of starting points for betaw (writing parameters). If not given, |
lambda |
vector of starting points for lambda (membership proportion). If not given, |
k |
number of latent classes (def: 2) |
nst |
number of random multiple starting points to try given a knot configuration. For each knot configuration, the output with the largest log-likelihood is stored among nst trials. (def: 20) |
epsilon |
stopping tolerance for the EM algorithm. (def:1e-06) |
maxit |
maximum number of the EM iterations allowed. If convergence is not declared before maxit, the EM algorithm stops with an error message and generates new starting points. (def: 500) |
maxrestarts |
maximum number of restarts (due to a singularity problem) allowed in the EM iterations. If convergence is not declared before maxrestarts, the algorithm stops with an error message and generates new starting points. (def: 100) |
maxer |
maximum number of errors allowed within a given knot configuration. If convergence is not declared before maxer, it tries a new knot configuration. (def: 20) |
The simulated data(simdata) in the examples is generated using information below:
1. User identifier: userid
Number of users: 700
Number of observations per user: 200
2. Dependent variables:
browsed: 1st stage binary dependent variable
wrote: 2nd stage binary dependent variable
3. Covariates:
x1 (discrete variable): random draws from a binomial distribution with 0.7 success probability
sp1 (splined variable): random draws from a uniform distribution between -2 and 2
4. Number of latent classes: 3
5. True parameters:
i.Betab(browsing)
Class1 | Class2 | Class3 | |
intercept | 1.5 | 2 | 0.3 |
x1 | -0.2 | -0.1 | 0.6 |
sp1 | -2.5 | 0.5 | 1 |
sp1 knot1 | 3 | -1 | 1 |
sp1 knot2 | 2 | -0.5 | -1.5 |
ii.Betaw(writing)
Class1 | Class2 | Class3 | |
intercept | -2 | -1 | 0.3 |
x1 | -0.3 | -0.2 | 0.2 |
sp1 | -3 | -2 | 0 |
sp1 knot1 | 3 | 2 | -1.5 |
sp1 knot2 | 3 | 2 | -1 |
iii.Lambda(membership proportion)
Class1 | Class2 | Class3 |
0.3 | 0.3 | 0.4 |
6. True knot configuration:
19 candidate knots for sp1 (20-iles)
knot1(browsing) | knot2(browsing) | knot1(writing) | knot2(writing) | |
Class1 | 8 | 14 | 4 | 11 |
Class2 | 5 | 15 | 5 | 12 |
Class3 | 5 | 13 | 7 | 14 |
StagedChoiceSplineMix returns a list of the following items:
best: best output, i.e., that giving the largest log-likelihood among M outputs.
best$loglik: log-likelihood
best$betab: parameter estimates of betab
best$betaw: parameter estimates of betaw
best$lambda: parameter estimates of lambda
best$sp.knots: knot configuration
loglike: vector of log-likelihoods for M outputs.
Bruch, E., F. Feinberg, K. Lee (in press), "Detecting Cupid's Vector: Extracting Decision Rules from Online Dating Activity Data," Proceedings of the National Academy of Sciences.
gen.init
move.knot
twostglogitregmixEM
"mixtools" package version 1.0.3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 | #######################################
###### 1. Generate data (simdata) #####
set.seed(77)
k<-3
betab<-matrix(c(1.5,2.0,0.3,-0.2,-0.1,0.6,-2.5,0.5,1,3,-1,1,2,-0.5,-1.5),5,k,byrow=TRUE)
betaw<-matrix(c(-2.0,-1,0.3,-0.3,-0.2,0.2,-3,-2,0,3,2,-1.5,3,2,-1),5,k,byrow=TRUE)
sp1.knots<-matrix(c(8,14,4,11,5,15,5,12,5,13,7,14),3,4,byrow=TRUE)
lambda<-c(0.3,0.3,0.4)
# Large data set
#n_id<-700
#nobsb<-200
# Small data set
n_id<-100
nobsb<-100
nb<-n_id*nobsb
idb <- rep(1:n_id, each=nobsb)
xb.1<-rbinom(nb,size=1,prob=0.7)
xb.com<-cbind(1,xb.1)
xb.sp1<-runif((nb),-2,2)
xb<-cbind(xb.com,xb.sp1)
sp1.mat.b<- matrix(double(20*nb),ncol=20)
sp1.mat.b[,1]<-xb.sp1
sp1.quan<-quantile(xb.sp1,c(0.05,0.1,0.15,0.2,0.25,0.3,0.35,0.4,0.45,0.5,
0.55,0.6,0.65,0.7,0.75,0.8,0.85,0.9,0.95))
for (i in 1:19){
sp1.mat.b[,(i+1)]<-xb.sp1-sp1.quan[i]
}
sp1.mat.b[sp1.mat.b<0]<-0
sp1.mat.b[,1]<-xb.sp1
xb.class<-list()
for (i in 1:k){
xb.class[[i]]<-cbind(xb.com,sp1.mat.b[,1],sp1.mat.b[,(sp1.knots[i,1:2])+1])
}
xbetab<-matrix(double(nb*k),ncol=k)
for (i in 1:k){
xbetab[,i]<-data.matrix(xb.class[[i]])%*%betab[,i]
}
num.idb<-as.vector(ddply(data.frame(idb),"idb",count)[,-1])
w<-sample(c(1:k), n_id, replace = TRUE, prob = lambda)
wb<-rep(w,num.idb)
yb.temp<-matrix(double(nb*k),ncol=k)
for (i in 1:k){
yb.temp[,i]<-rbinom((nb), size = 1, prob = (1/(1+exp(-xbetab[, i]))))
}
yb<-sapply(1:nb, function(i) yb.temp[i,wb[i]])
idw<-idb[yb==1]
nw<-length(idw)
sp1.mat.w<-sp1.mat.b[yb==1,]
xw.com<-xb.com[yb==1,]
xw<-xb[yb==1,]
xw.class<-list()
for (i in 1:k){
xw.class[[i]]<-cbind(xw.com,sp1.mat.w[,1],sp1.mat.w[,(sp1.knots[i,3:4])+1])
}
xbetaw<-matrix(double(nw*k),ncol=k)
for (i in 1:k){
xbetaw[,i]<-data.matrix(xw.class[[i]])%*%betaw[,i]
}
num.idw<-as.vector(ddply(data.frame(idw),"idw",count)[,-1])
ww<-wb[yb==1]
yw.temp<-matrix(double(nw*k),ncol=k)
for (i in 1:k){
yw.temp[,i]<-rbinom((nw), size = 1, prob = (1/(1+exp(-xbetaw[, i]))))
}
yw<-sapply(1:nw, function(i) yw.temp[i,ww[i]])
yb.aug<-cbind(1:length(yb),yb)
yw.aug<-cbind(which(yb==1),yw)
colnames(yb.aug)[1]<-"num"
colnames(yw.aug)[1]<-"num"
ybyw<-merge(yb.aug,yw.aug, all = TRUE)[,-1]
simdata<-cbind(idb,ybyw,xb.1,sp1.mat.b[,1])
simdata[is.na(simdata)]<-""
simdata[,3]<-as.integer(simdata[,3])
colnames(simdata)[1]<-"userid"
colnames(simdata)[2]<-"browsed"
colnames(simdata)[3]<-"wrote"
colnames(simdata)[4]<-"x1"
colnames(simdata)[5]<-"sp1"
########################################
##### 2. Run StagedChoiceSplineMix #####
## number of latent classes
set.seed(66)
k<-3
## starting points: true parameters used in the data generation (optional)
betab<-matrix(c(1.5,2.0,0.3,-0.2,-0.1,0.6,-2.5,0.5,1,3,-1,1,2,-0.5,-1.5),5,k,byrow=TRUE)
betaw<-matrix(c(-2.0,-1,0.3,-0.3,-0.2,0.2,-3,-2,0,3,2,-1.5,3,2,-1),5,k,byrow=TRUE)
lambda<-c(0.3,0.3,0.4)
## number of random multiple starting points to try given a knot configuration
nst<-1
## vector of the columns of spline variables in the data set (required)
sp.cols<-5
# vector of the numbers of candidate knots for splined variables (optional)
num.knots<-19
## true knot configuration used in the data generation
sp1.knots<-matrix(c(8,14,4,11,5,15,5,12,5,13,7,14),3,4,byrow=TRUE)
## list of knot configuration of spline variables (optional)
sp.knots<-list(sp1.knots)
## Run "StagedChoiceSplineMix"
out<-StagedChoiceSplineMix(data=simdata, M=1, sp.cols=sp.cols, num.knots, sp.knots,
betab, betaw, lambda, k=k, nst=nst, epsilon=1e-06, maxit=500, maxrestarts=100, maxer=20)
## output
out$loglike # vector of M log-likelihoods
out$best$loglik # log-likelihood of the best output
out$best$betab # betab estimates of the best output
out$best$betaw # betaw estimates of the best output
out$best$lambda # lambda estimates of the best output
out$best$sp.knots # knot configuration of the best output
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.