R/covariate.predictions.r

Defines functions covariate.predictions

Documented in covariate.predictions

#' Compute estimates of real parameters for multiple covariate values
#' 
#' Computes real estimates for a dataframe of covariate values and the var-cov
#' matrix of the real estimates.
#' 
#' This function has a similar use as \code{\link{compute.real}} except that it
#' is specifically designed to compute real parameter estimates for multiple
#' covariate values for either a single model or to compute model averaged
#' estimates across a range of models within a marklist. This is particularly
#' useful for computing and plotting the real parameter as a function of the
#' covariate with pointwise confidence bands (see example below). The function
#' also computes a variance-covariance matrix for the real parameters.  For
#' example, assume you had a model with two age classes of young and adult and
#' survial for young was a function of weight and you wanted to estimate
#' survivorship to some adult age as a function of weight.  To do that you need
#' the survival for young as a function of weight, the adult survival, the
#' variance of each and their covariance.  This function will allow you to
#' accomplish tasks like these and many others.
#' 
#' When a variance-covariance matrix is computed for the real parameters, it
#' can get too large for available memory for a large set of real parameters.
#' Most models contain many possible real parameters to accomodate the general
#' structure even if there are very few unique ones. It is necessary to use the
#' most general structure to accomodate model averaging.  Most of the time you
#' will only want to compute the values of a limited set of real parameters but
#' possibly for a range of covariate values.  Use the argument \code{indices}
#' to select the real parameters to be computed.  The index is the value that
#' the real parameter has been assigned with the all-different PIM structure.
#' If you looked at the row numbers in the design data for the
#' \code{\link{dipper}} example, you would see that the parameter for p and Phi
#' are both numbered 1 to 21.  But to deal with multiple parameters effectively
#' they are given a unique number in a specific order.  For the CJS model, p
#' follows Phi, so for the dipper example, Phi are numbered 1 to 21 and then p
#' are numbered 22 to 42. You can use the function \code{\link{PIMS}} to lookup
#' the parameter numbers for a parameter type if you use
#' \code{simplified=FALSE}. For example, with the \code{\link{dipper}} data,
#' you could enter \code{PIMS(dipper.model,"p",simplified=FALSE)} and you would
#' see that they are numbered 22 to 42. Alternatively, you can use
#' \code{\link{summary.mark}} with the argument \code{se=TRUE} to see the list
#' of indices named \code{all.diff.index}.  They are included in a dataframe
#' for each model parameter which enables them to be selected based on the
#' attached data values (e.g., time, group etc).  For example, if you fitted a
#' model called \code{dipper.model} then you could use
#' \code{summary(dipper.model,se=TRUE)$real} to list the indices for each
#' parameter.
#' 
#' The argument \code{data} is a dataframe containing values for the covariates
#' used in the models.  The names for the fields should match the names of the
#' covariates used in the model.  If a time-varying covariate is used then you
#' need to specify the covariate name with the time index included as it is
#' specified in the data. You do not need to specify all covariates that were
#' used.  If a covariate in one or more models is not included in \code{data}
#' then the mean value will be used for each missing covariate.  That can be
#' very useful when you are only interested in prediction for one type of
#' parameters (eg Phi) when there are many covariates that are not interesting
#' in another parameter (e.g., p).  For each row in \code{data}, each parameter
#' specified in \code{indices} is computed with those covariate values.  So if
#' there were 5 rows in data and 10 parameters were specified there would be 10
#' sets of 5 (50) estimates produced.  If you do not want the full pairing of
#' data and estimates, create a field called \code{index} in \code{data} and
#' the estimate for that parameter will be computed with those specific values.
#' For example, if you wanted parameter 1 to be computed with 5 different
#' values of a covariate and then parameter 7 with 2 different covariate
#' values, you could create a dataframe with 5 rows each having an \code{index}
#' with the value 1 along with the relevant covariate data and an additional 2
#' rows with the \code{index} with the value 7.  If you include the field
#' \code{index} in \code{data} then you do not need to give a value for the
#' argument \code{indices}. However, if you are making the computations for parameters 
#' that use an mlogit link you must use the separate indices argument. If you try to
#' use the data.frame(index=...,cov) approach with mlogit parameters and you have
#' covariate values, the function will stop with an error.  Also, if you only include 
#' a portion of the indices in an mlogit set, it will also stop and issue an error and
#' tell you the set of indices that should be included for that mlogit set.  If you
#' were allowed to exclude some indices the result would be incorrect. 
#' 
#' @param model MARK model object or marklist
#' @param data dataframe with covariate values used for estimates; if it
#' contains a field called index the covariates in each row are only applied to
#' the parameter with that index and the argument indices is not needed; if data is not specified or
#' all individual covariate values are not specified, the mean individual covariate value is
#' used for prediction.
#' @param indices a vector of indices from the all-different PIM structure for
#' parameters to be computed (model.index value in the design data)
#' @param drop if TRUE, models with any non-positive variance for betas are
#' dropped
#' @param revised if TRUE it uses eq 6.12 from Burnham and Anderson (2002) for
#' model averaged se; otherwise it uses eq 4.9
#' @param mata if TRUE, create model averaged tail area confidence intervals as described by Turek and Fletcher
#' @param alpha The desired lower and upper error rate.  Specifying alpha=0.025
#' corresponds to a 95% MATA-Wald confidence interval, an' 
#' alpha=0.05 to a 90% interval.  'alpha' must be between 0 and 0.5.
#' Default value is alpha=0.025. This argument now works to set standard confidence interval as well using 
#' qnorm(1-alpha) for critical value.
#' @param normal.lm Specify normal.lm=TRUE for the normal linear model case, and 
#' normal.lm=FALSE otherwise.  When normal.lm=TRUE, the argument 
#' 'residual.dfs' must also be supplied.  See USAGE section, 
#' and Turek and Fletcher (2012) for additional details.
#' @param residual.dfs A vector containing the residual (error) degrees of freedom 
#' under each candidate model.  This argument must be provided 
#' when the argument normal.lm=TRUE.
#' @param ... additional arguments passed to specific functions
#' @return A list is returned containing a dataframe of estimates, a
#' var-cov matrix, and a reals list: 
#' \item{estimates}{ data frame containing estimates, se,
#' confidence interval and the data values used to compute the estimates}
#' \item{vcv}{variance-covariance matrix of real estimates}
#' \item{reaks}{list of dataframes with the estimate and se used for each model}
#' @author Jeff Laake
#' @export
#' @seealso \code{\link{compute.real}},\code{\link{model.average}}
#' @keywords utility
#' @examples
#' pathtodata=paste(path.package("RMark"),"extdata",sep="/")
#' \donttest{
#' # This example is excluded from testing to reduce package check time
#' #
#' # indcov1.R
#' #
#' # CJS analysis of the individual covariate data from 12.2 of
#' # Cooch and White
#' #
#' # Import data (indcov1.inp) and convert it from the MARK inp file 
#' # format to the RMark format using the function convert.inp  
#' # It is defined with 1 group but 2 individual covariates of mass and 
#' # sqmass
#' #
#' indcov1=convert.inp(paste(pathtodata,"indcov1",sep="/"),
#'                covariates=c("mass","sqmass"))
#' #
#' # Next create the processed dataframe and the design data.
#' #
#'   ind1.process=process.data(indcov1,model="CJS")
#'   ind1.ddl=make.design.data(ind1.process)
#' #
#' # Next create the function that defines and runs the set of models
#' # and returns a marklist with the results and a model.table.  
#' # It does not have any arguments but does use the ind1.process 
#' # and ind1.ddl objects created above in the workspace. The function 
#' # create.model.list is the function that creates a dataframe of the
#' # names of the parameter specifications for each parameter in that 
#' # type of model. If none are given for any parameter, the default
#' # specification will be used for that parameter in mark.  The 
#' # first argument of mark.wrapper is the model list of parameter 
#' # specifications.  Remaining arguments that are passed to
#' # mark must be specified using the argument=value specification 
#' # because the arguments of mark were not repeated in mark.wrapper 
#' # so they must be passed using the argument=value syntax.
#' #
#' ind1.models=function()
#' {
#'   Phi.dot=list(formula=~1)
#'   Phi.mass=list(formula=~mass)
#'   Phi.mass.plus.mass.squared=list(formula=~mass + sqmass)
#'   p.dot=list(formula=~1)
#'   cml=create.model.list("CJS")
#'   results=mark.wrapper(cml,data=ind1.process,ddl=ind1.ddl,adjust=FALSE,delete=TRUE)
#'   return(results)
#' }
#' #
#' # Next run the function to create the models and store the results in
#' # ind1.results which is a marklist. Note that beta estimates will differ
#' # from Cooch and White results because we are using covariate values 
#' # directly rather than standardized values.
#' #
#' ind1.results=ind1.models()
#' #
#' # Next compute real parameter values for survival as a function of 
#' # mass which are model-averaged over the fitted models.
#' #
#' minmass=min(indcov1$mass)
#' maxmass=max(indcov1$mass)
#' mass.values=minmass+(0:30)*(maxmass-minmass)/30
#' Phibymass=covariate.predictions(ind1.results,
#'    data=data.frame(mass=mass.values,sqmass=mass.values^2),
#'     indices=c(1))
#' #
#' # Plot predicted model averaged estimates by weight with pointwise 
#' # confidence intervals
#' #
#' plot(Phibymass$estimates$mass, Phibymass$estimates$estimate,
#'    type="l",lwd=2,xlab="Mass(kg)",ylab="Survival",ylim=c(0,.65))
#' lines(Phibymass$estimates$mass, Phibymass$estimates$lcl,lty=2)
#' lines(Phibymass$estimates$mass, Phibymass$estimates$ucl,lty=2)
#'
#' # indcov2.R
#' #
#' # CJS analysis of the individual covariate data from 12.3 of 
#' # Cooch and White
#' #
#' # Import data (indcov2.inp) and convert it from the MARK inp file 
#' # format to the RMark format using the function convert.inp  It is
#' # defined with 1 group but 2 individual covariates of mass and 
#' # sqmass
#' #
#' indcov2=convert.inp(paste(pathtodata,"indcov2",sep="/"),
#'         covariates=c("mass","sqmass"))
#' #
#' # Standardize covariates
#' #
#' actual.mass=indcov2$mass
#' standardize=function(x,z=NULL)
#' {
#'    if(is.null(z))
#'    {
#'        return((x-mean(x))/sqrt(var(x)))
#'    }else 
#'    {
#'       return((x-mean(z))/sqrt(var(z)))
#'    }
#' }
#' indcov2$mass=standardize(indcov2$mass)
#' indcov2$sqmass=standardize(indcov2$sqmass)
#' #
#' # Next create the processed dataframe and the design data.
#' #
#'   ind2.process=process.data(indcov2,model="CJS")
#'   ind2.ddl=make.design.data(ind2.process)
#' #
#' # Next create the function that defines and runs the set of models and 
#' # returns a marklist with the results and a model.table.  It does not
#' # have any arguments but does use the ind1.process and ind1.ddl 
#' # objects created above in the workspace. The function create.model.list 
#' # is the function that creates a dataframe of the names of the parameter
#' # specifications for each parameter in that type of model. If none are 
#' # given for any parameter, the default specification will be used for
#' # that parameter in mark.  The first argument of mark.wrapper is the
#' # model list of parameter specifications.  Remaining arguments that are
#' # passed to mark must be specified using the argument=value specification 
#' # because the arguments of mark were not repeated in mark.wrapper so 
#' # they must be passed using the argument=value syntax.
#' #
#' ind2.models=function()
#' {
#'   Phi.dot=list(formula=~1)
#'   Phi.time=list(formula=~time)
#'   Phi.mass=list(formula=~mass)
#'   Phi.mass.plus.mass.squared=list(formula=~mass + sqmass)
#'   Phi.time.x.mass.plus.mass.squared=
#'         list(formula=~time:mass + time:sqmass)
#'   Phi.time.mass.plus.mass.squared=
#'      list(formula=~time*mass + sqmass+ time:sqmass)
#'   p.dot=list(formula=~1)
#'   cml=create.model.list("CJS")
#'   results=mark.wrapper(cml,data=ind2.process,ddl=ind2.ddl,adjust=FALSE,threads=2,delete=TRUE)
#'   return(results)
#' }
#' #
#' # Next run the function to create the models and store the results in
#' # ind2.results which is a marklist. Note that beta estimates will differ
#' # because we are using covariate values directly rather than 
#' # standardized values.
#' #
#' ind2.results=ind2.models()
#' #
#' # Next compute real parameter values for survival as a function of 
#' # mass which are model-averaged over the fitted models. They are 
#' # standardized individually so the values have to be chosen differently.
#' #
#' minmass=min(actual.mass)
#' maxmass=max(actual.mass)
#' mass.values=minmass+(0:30)*(maxmass-minmass)/30
#' sqmass.values=mass.values^2
#' mass.values=standardize(mass.values,actual.mass)
#' sqmass.values=standardize(sqmass.values,actual.mass^2)
#' Phibymass=covariate.predictions(ind2.results,
#'  data=data.frame(mass=mass.values,sqmass=sqmass.values),
#'   indices=c(1:7))
#' #
#' # Plot predicted model averaged estimates by weight with pointwise 
#' # confidence intervals
#' #
#' par(mfrow=c(4,2))
#' for (i in 1:7)
#' {
#'    mass=minmass+(0:30)*(maxmass-minmass)/30
#'    x=Phibymass$estimates
#'    plot(mass,x$estimate[x$par.index==i],type="l",lwd=2,
#'     xlab="Mass(kg)",ylab="Survival",ylim=c(0,1),main=paste("Time",i))
#'    lines(mass, x$lcl[x$par.index==i],lty=2)
#'    lines(mass, x$ucl[x$par.index==i],lty=2)
#' }
#' }
#' 
covariate.predictions <- function(model,data=NULL,indices=NULL,drop=TRUE, revised=TRUE, mata=FALSE, normal.lm=FALSE, residual.dfs=0, alpha=0.025,...)
{
# ------------------------------------------------------------------------------------------------
#
# covariate.predictions
#            -   computes real estimates and var-cov for a set of
#                covariate values
#
# Arguments:
#
#   model          - MARK model object or marklist
#   data           - dataframe with covariate values used for estimates
#                     if it contains a field index the covariates in each row
#                       are only applied to that index and the argument indices is not needed
#   indices        - all-different PIM indices for parameters to be calculated
#   drop           - if TRUE, models with any non-positive variance for betas are dropped
#
#
# Value (list):
#
#   estimates        - data frame containing estimates and se
#   vcv              - variance-covariance matrix of real estimates
#
# ------------------------------------------------------------------------------------------------
   if(!inherits(model,"marklist"))
   {
     number.of.models=1
     model=load.model(model)
     model.list=list(model)
   }
   else
   {
     number.of.models=length(model)-1
     if(is.null(model$model.table))
        stop("\nmarklist created by collect.models must contain a model.table to use model.average\n")
     model.list=model
     model.table=model$model.table
   }
#   if(is.null(data))
#      stop("\n data argument must be specified\n")
   if(!is.null(data)&!is.data.frame(data))
      stop("\n data argument must be a dataframe. Do not use processed data list.\n")
#
#   If there is an index field in data, then only use that row of data for that index
#   That filtering is done below after the complete design matrix is created for each model.
#
   if(!is.null(data$index))
   {
      index=data$index
      indices=index
      replicate.values=FALSE
   }
   else {
	   if(!is.null(data))
		   replicate.values=TRUE
	   else{
		   replicate.values=FALSE
		   if(is.null(indices))
			   stop("\nValue for indices argument must be given or index field must be included in data argument\n")
		   else
		   {
			   data=data.frame(index=indices)
			   index=indices
		   }
	   }
   }
#
# Determine if any of the models should be dropped because beta.var non-positive or failed to run
#
dropped.models=NULL
if(number.of.models>1)
{
# check if any models failed to run
for (i in 1:(length(model.list)-1))
{
	model=load.model(model.list[[i]])
	if(length(coef(model)$se)==0)  dropped.models=c(dropped.models,i)
}
if(drop)
{
   for (i in 1:dim(model.table)[1])
   {
      model=load.model(model.list[[i]])
	 
      model.indices=unique(model$simplify$pim.translation[indices])
      used.beta=which(apply(model$design.matrix[model.indices,,drop=FALSE],2,function(x)!all(x=="0")))
      if(any(is.nan(model$results$beta.vcv[used.beta,used.beta])) || any(is.infinite(model$results$beta.vcv[used.beta,used.beta])) ||
         any(diag(model$results$beta.vcv[used.beta,used.beta,drop=FALSE])<0))
      {
         dropped.models=c(dropped.models,i)
         message("\nModel ",i,"dropped from the model averaging because one or more beta variances are not positive\n")
      }
   }
#
# If any models have been dropped, recompute the weights for the model table
#
   if(!is.null(dropped.models))
   {
      model.table$weight[as.numeric(row.names(model.table))%in%dropped.models]=0
      model.table$weight=model.table$weight/sum(model.table$weight)
   }
}
}
else
{
	model.indices=unique(model$simplify$pim.translation[indices])
	used.beta=which(apply(model$design.matrix[model.indices,,drop=FALSE],2,function(x)!all(x=="0")))
	if(any(is.nan(model$results$beta.vcv[used.beta,used.beta])) || any(is.infinite(model$results$beta.vcv[used.beta,used.beta])) ||
	   any(diag(model$results$beta.vcv[used.beta,used.beta,drop=FALSE])<0))
	  message("\nModel has one or more beta variances that are not positive\n")
}
reals=vector("list",length=number.of.models)
firstmodel=TRUE
for (j in 1:number.of.models)
{
   if(j %in% dropped.models) next
#
#   Assign value of chat - overdispersion constant
#
   model=load.model(model.list[[j]])
   if(is.null(model$chat))
     chat=1
   else
     chat=model$chat
   beta=model$results$beta$estimate
   model.indices=unique(model$simplify$pim.translation[indices])
   links=NULL
   design=NULL
   fixedparms=NULL
   boundaryparms=NULL
#   
#  If there are no individual covariates in the model, the code below will extract only those
#  rows in the DM.  
#
   nmlogit=0
   if(is.null(model$covariates)&ncol(data)==1 && names(data)=="index")
#   if(ncol(data)==1 && names(data)=="index")
   {
     for (i in 1:nrow(data))
     {
        model.index=model$simplify$pim.translation[index[i]]   
        design=rbind(design,model$design.matrix[model.index,,drop=FALSE])
        if(length(model$links)==1)
           links=c(links,model$links)
        else
		   links=c(links,model$links[model.index])	
        if(!is.null(model$fixed))
        {
           if(is.null(model$simplify))
              xfixedparms=(1:dim(model$design.matrix)[1])%in%model$fixed$index
           else                                                           
              xfixedparms=(1:dim(model$design.matrix)[1])%in%model$simplify$pim.translation[model$fixed$index]
        }
        else
           xfixedparms=rep(FALSE,dim(model$design.matrix)[1])
        fixedparms=c(fixedparms,xfixedparms[model.index])
        boundaryparms=c(boundaryparms,(model$results$real$se==0 & !xfixedparms)[model.index])
     }
	 mlinks=substr(links,1,6)=="mlogit"
	 if(any(mlinks))
	 {
		 mlogit.list=split(1:length(links[mlinks]),links[mlinks])
		 for(i in 1:length(mlogit.list))
		 {
			 if(any(!which(model$links==names(mlogit.list)[i])%in%model.indices)) stop(paste("\n some indices are missing for parameters with",
								 names(mlogit.list)[i],"\n should include",paste(which(model$simplify$pim.translation%in%which(model$links==names(mlogit.list)[i])),collapse=",")))
		 }
	 }
   }
   else
# If there are data values for covariates then it fills in a complete DM for each record in the data
# and appends to create a large DM with all individual DMs stacked on each other.  Then in the 
# next loop if an index is specified in data, it selects out only the rows of the large DM for those 
# specific parameter indices.
#
   {
     for (i in 1:nrow(data))
     {
        xdesign=fill.covariates(model,find.covariates(model,data[i,,drop=FALSE]))[model.indices,,drop=FALSE]
        if(length(model$links)==1)
           links=c(links,rep(model$links,dim(xdesign)[1]))
        else
		if(length(model$links)==1)
			links=c(links,model$links)
		else
		{
			clinks=model$links[model.indices]
			mlinks=substr(clinks,1,6)=="mlogit"
			if(any(mlinks))
			{
				if(!replicate.values) 
					stop("Computations for mlogit parameters with covariate values cannot be specified with \nindex column in data; use separate indices argument" )
				mlogit.list=split(1:length(clinks[mlinks]),clinks[mlinks])
				for(i in 1:length(mlogit.list))
				{
					if(any(!which(model$links==names(mlogit.list)[i])%in%model.indices)) stop(paste("\n some indices are missing for parameters with",
										names(mlogit.list)[i],"\n should include",paste(which(model$simplify$pim.translation%in%which(model$links==names(mlogit.list)[i])),collapse=",")))
					nmlogit=nmlogit+1
					clinks[mlinks][mlogit.list[[i]]]=paste("mlogit(",nmlogit,")",sep="")
				}
			}	
			links=c(links,clinks)		
		}
		design=rbind(design,xdesign)
        if(!is.null(model$fixed))
        {
           if(is.null(model$simplify))
              xfixedparms=(1:dim(model$design.matrix)[1])%in%model$fixed$index
           else                                                           
              xfixedparms=(1:dim(model$design.matrix)[1])%in%model$simplify$pim.translation[model$fixed$index]
        }
        else
           xfixedparms=rep(FALSE,dim(model$design.matrix)[1])
        fixedparms=c(fixedparms,xfixedparms[model.indices])
        boundaryparms=c(boundaryparms,(model$results$real$se==0 & !xfixedparms)[model.indices])
     }
#
#    Filter rows if each covariate set is not replicated for each parameter
#
     if(!replicate.values)
     {
        row.numbers=NULL
        for(i in 1:length(index))
        {
          model.index=model$simplify$pim.translation[index[i]]
          row.numbers=c(row.numbers,match(model.index,model.indices)+(i-1)*length(model.indices))
        }
        design=design[row.numbers,,drop=FALSE]
        fixedparms=fixedparms[row.numbers]
        boundaryparms=boundaryparms[row.numbers]
        links=links[row.numbers]
     }
   }
#                                                                         
#   The following shouldn't happen unless the model and design matrices are mixed between models
#
   if(dim(design)[2]!=length(beta))
       stop("Mismatch between number of design columns and length of beta")
#
#  Convert to a numeric matrix
#
   design=matrix(as.numeric(design),nrow=dim(design)[1])
# 
#  Create vector of fixed parameters to cope with partial fixing of mlogits
#
   fixedvalues=rep(NA,nrow(design))
   fixedvalues[fixedparms]=model$fixed$value[match(indices[fixedparms],model$fixed$index)]
#
#  Compute real parameters; if neither se or vcv then return vector of real parameters
#
   real=convert.link.to.real(design%*%beta,links=links,fixed=fixedvalues)
#
#  Set fixed real parameters to their fixed values
#
   real[fixedparms]=fixedvalues[fixedparms]
#   Previously read as follows -- fixed in v1.9.5
#   real[fixedparms]=model$fixed$value[model$fixed$index%in%indices[fixedparms]]
#
#  Compute vc matrix for real parameters which needs to be modified for
#  any mlogit parameters
#
   if(length(model$links)==1)
      deriv.real=deriv_inverse.link(real,design,model$links)
   else
      deriv.real=t(apply(data.frame(real=real,x=design,links=links),1,
            function(x){deriv_inverse.link(as.numeric(x[1]),as.numeric(x[2:(length(x)-1)]),x[length(x)])}))
   vcv.real=deriv.real%*%model$results$beta.vcv%*%t(deriv.real)
#
#   If vcv=TRUE, compute v-c matrix and std errors of real estimates
#   To handle any mlogit parameters compute pseudo-real estimates using log in place of mlogit
#
   ind=grep("mlogit",links,ignore.case=TRUE)
   templinks=links
   if(length(ind)>0)
   {
      templinks[ind]="log"
      pseudo.real=as.vector(convert.link.to.real(design%*%beta,links=templinks))
      pseudo.real[fixedparms]=fixedvalues[fixedparms]
      pseudo.real[ind][fixedparms[ind]]=exp(pseudo.real[ind][fixedparms[ind]])
#
#     Compute first derivatives of pseudo-real (for any mlogit parameter)
#     estimates with respect to beta parameters
#
      if(length(templinks)==1)
        deriv.pseudo=deriv_inverse.link(pseudo.real,design,templinks)
      else
         deriv.pseudo=t(apply(data.frame(real=pseudo.real,x=design,links=templinks),1,
               function(x){deriv_inverse.link(as.numeric(x[1]),as.numeric(x[2:(length(x)-1)]),x[length(x)])}))
      deriv.pseudo[fixedparms,]=0
      vcv.pseudo=chat*deriv.pseudo%*%model$results$beta.vcv%*%t(deriv.pseudo)
#
#     Apply chain rule to get variance of real parameters which has mlogits
#     expressed as zi/(1+z1+...zk) where k is number of mlogit components-1 and
#     non-mlogits are expressed as zi.
#     bottom is either 1 for non-mlogits and the sum for mlogits
#     pbottom is partial with respect to zi
#
      mlogits=outer(links,links,function(x,y)as.numeric(x==y))*as.numeric(substr(links,1,6)=="mlogit" |substr(links,1,6)=="MLogit" )
      pbottom=matrix(0,nrow=dim(vcv.pseudo)[1],ncol=dim(vcv.pseudo)[1]) + mlogits
      bottom=diag(nrow=dim(vcv.pseudo)[1])*(1-as.numeric(substr(links,1,6)=="mlogit" |substr(links,1,6)=="MLogit"))+
          mlogits + pbottom*apply(pbottom*pseudo.real,2,sum)
      deriv.pseudo=(diag(nrow=dim(vcv.pseudo)[1])*bottom-pseudo.real*pbottom)/bottom^2
      deriv.pseudo[is.nan(deriv.pseudo)]=0
      vcv.real=deriv.pseudo%*%vcv.pseudo%*%t(deriv.pseudo)
   }
   else
      vcv.real=chat*vcv.real
#
#  Compute conf interval taking into account use of logit transform for mlogit links
#
   link.se=suppressWarnings(sqrt(chat*diag(design%*%model$results$beta.vcv%*%t(design))))
   link.se[is.na(link.se)]=0
   ind=unique(c(grep("mlogit",links,ignore.case=TRUE),which(links%in%c("sin","Sin","LogLog","loglog","CLogLog","cloglog"))))
   linkse=suppressWarnings(sqrt(diag(vcv.real)[ind])/(real[ind]*(1-real[ind])))
   linkse[is.na(linkse)]=0
   linkse[is.infinite(linkse)]=0
   link.se[ind]=linkse
   link.values=design%*%beta
   link.values[ind]=suppressWarnings(log(real[ind]/(1-real[ind])))
   link.values[ind][abs(real[ind]-1)<1e-7]=100
   link.values[ind][abs(real[ind]-0)<1e-7]=-100
   links[ind]="logit"
   real.lcl=convert.link.to.real(link.values-qnorm(1-alpha)*link.se,links=links)
   real.ucl=convert.link.to.real(link.values+qnorm(1-alpha)*link.se,links=links)
   real.lcl[fixedparms]=real[fixedparms]
   real.ucl[fixedparms]=real[fixedparms]   
#
#  Set v-c values of fixed parameters to 0
#
   vcv.real[fixedparms,]=0
   vcv.real[,fixedparms]=0
   se.real=suppressWarnings(sqrt(diag(vcv.real)))
   se.real[is.na(se.real)]=0
   fixed=rep("",dim(design)[1])
   fixed[fixedparms]="Fixed"
   fixed[boundaryparms]="Boundary"
#
#  Now expand unique parameters to a dataframe with indices
#
   if(replicate.values)
   {
      estimates=NULL
      for (i in 1:dim(data)[1])
      {
         model.indices=model$simplify$pim.translation[indices]
         lookup.indices=match(model.indices,c(rep(0,(i-1)*length(unique(model.indices))),unique(model.indices)))
         covdata=data[rep(i,length(lookup.indices)),]
         xdata=data.frame(vcv.index=lookup.indices,model.index=model.indices,par.index=indices,covdata,estimate=real[lookup.indices],se=se.real[lookup.indices],
               lcl=real.lcl[lookup.indices],ucl=real.ucl[lookup.indices],fixed=fixed[lookup.indices])
         estimates=rbind(estimates,xdata)
      }
      row.names(estimates)=1:dim(estimates)[1]
#
#     Expand v-c matrix
#
      nreals=dim(estimates)[1]
      zz=matrix(1,nrow=nreals,ncol=nreals)*estimates$vcv.index
      vcv.real=matrix(vcv.real[cbind(as.vector(zz),as.vector(t(zz)))],nrow=nreals,ncol=nreals)
   }
   else
   {
      model.indices=model$simplify$pim.translation[indices]
      estimates=data.frame(vcv.index=model.indices,model.index=model.indices,par.index=indices,data,estimate=real,se=se.real,
            lcl=real.lcl,ucl=real.ucl,fixed=fixed)
   }
   reals[[j]]=subset(estimates,select=c("estimate","se"))
#
#  If this is a single model return results
#
   if(number.of.models==1)
   {
      return(list(estimates=estimates,vcv=vcv.real))
   }
   else
#  otherwise construct list for model averaging
   {
	   if(firstmodel)
	   {
		   firstmodel=FALSE
		   nreals=dim(estimates)[1]
		   estimate.mat=matrix(NA,nrow=number.of.models,ncol=nreals)
		   se.mat=matrix(NA,nrow=number.of.models,ncol=nreals)
		   estimates.average=estimates
		   weight=vector("numeric",length=number.of.models)
		   mavg.vcv=vector("list",length=number.of.models)
	   }
	   mavg.vcv[[j]]=vcv.real
	   estimate.mat[j,]=reals[[j]]$estimate
	   if(any(reals[[j]]$se<0)) 
	   {
		   warning("Negative variances for parameters ",paste((1:nreals)[reals[[j]]$se<0],collapse=", ")," for model ",j,". Setting those variances to 0")
		   reals[[j]]$se[reals[[j]]$se<0]=0
	   }
	   se.mat[j,]=reals[[j]]$se
	   weight[j]=model.table$weight[as.numeric(row.names(model.table))==j]	   
   }
#
#  Otherwise if this is the first model in the list setup dataframe for
#  average estimates and average correlation matrix.
#
#   if(firstmodel)
#   {
#      firstmodel=FALSE
#      nreals=dim(estimates)[1]
#      estimates.average=estimates
#      estimates.average$estimate=0
#      cor.average=matrix(0,nrow=nreals,ncol=nreals)
#   }
#
#  For each model the estimates are averaged and so is the correlation matrix
#
#   estimates.average$estimate=estimates.average$estimate+
#         estimates$estimate*model.table$weight[as.numeric(row.names(model.table))==j]
#   cor=vcv.real/outer(estimates$se,estimates$se,"*")
#   if(any(is.infinite(diag(cor)))) 
#	   warning("Infinite correlation (se=0) for model  ",j, " for estimate ",which(is.infinite(diag(cor))),"\n")
#   diag(cor)=1
#   cor.average=cor.average+cor*model.table$weight[as.numeric(row.names(model.table))==j]
}
if(!is.null(dropped.models))
{
	weight=weight[-dropped.models]
	mavg.vcv=mavg.vcv[-dropped.models]
}
if(nreals==1){
	estimate.mat=as.vector(estimate.mat)
	se.mat=as.vector(se.mat)
	if(!is.null(dropped.models))
	{
		estimate.mat=estimate.mat[-dropped.models]
		se.mat=se.mat[-dropped.models]
	}
} else
	if(!is.null(dropped.models))
{
		estimate.mat=estimate.mat[-dropped.models,]
		se.mat=se.mat[-dropped.models,]
}
	
mavg.res=model.average.list(x=list(estimate=estimate.mat,weight=weight,vcv=mavg.vcv),revised=revised, mata=FALSE, normal.lm=normal.lm, residual.dfs=residual.dfs, alpha=alpha,...)
# if mata, compute tail averaged intervals on the link values and then convert back to reals
if(mata)
{
  mavg.res$lcl=vector("numeric",length=ncol(estimate.mat))
  mavg.res$ucl=vector("numeric",length=ncol(estimate.mat))
  for(j in 1:ncol(estimate.mat))
  {
    link.estimate=NULL
    link.se=NULL
    for (i in 1:nrow(estimate.mat))
    {
       link.list=compute.links.from.reals(estimate.mat[i,j],model.list[[1]],parm.indices=estimates.average$par.index[j],vcv.real= mavg.vcv[[i]][j,j],use.mlogits=FALSE)
       link.estimate=c(link.estimate,link.list$estimates)
       link.se=c(link.se,sqrt(diag(link.list$vcv)))
    }
    interval=mata.wald(theta.hats=link.estimate, se.theta.hats=link.se, model.weights=weight, normal.lm=normal.lm, residual.dfs=residual.dfs, alpha=alpha) 	 
    mavg.res$lcl[j]=inverse.link(interval[1],link.list$links)
    mavg.res$ucl[j]=inverse.link(interval[2],link.list$links)
  }
}
#
#  After processing each model, the model averaged se and v-c matrix is
#  computed.
#
#revised=TRUE
#se.average=rep(0,nreals)
#for (i in 1:dim(model.table)[1])
#{
#   if(i %in% dropped.models) next   
#   if(revised)
#	   se.average=se.average+model.table$weight[as.numeric(row.names(model.table))==i]*
#			   (reals[[i]]$se^2 + (reals[[i]]$estimate-estimates.average$estimate)^2)  
#   else
#	   se.average=se.average+model.table$weight[as.numeric(row.names(model.table))==i]*
#			   sqrt(reals[[i]]$se^2 + (reals[[i]]$estimate-estimates.average$estimate)^2)
#}
#if(revised) se.average=sqrt(se.average)

names(reals)=1:number.of.models
estimates.average$estimate=mavg.res$estimate
estimates.average$se=mavg.res$se
vcv.real=mavg.res$vcv
#vcv.real=cor.average*outer(se.average,se.average,"*")
#vcv.real[is.nan(vcv.real)]=0
#vcv.real[is.infinite(abs(vcv.real))]=0
if(!mata)
{
	link.list=compute.links.from.reals(estimates.average$estimate,model.list[[1]],parm.indices=estimates.average$par.index,vcv.real=vcv.real,use.mlogits=FALSE)
	estimates.average$lcl=link.list$estimates-qnorm(1-alpha)*sqrt(diag(link.list$vcv))
	estimates.average$ucl=link.list$estimates+qnorm(1-alpha)*sqrt(diag(link.list$vcv))
	estimates.average$lcl=apply(data.frame(x=estimates.average$lcl,links=link.list$links),1,function(x){inverse.link(as.numeric(x[1]),x[2])})
	estimates.average$ucl=apply(data.frame(x=estimates.average$ucl,links=link.list$links),1,function(x){inverse.link(as.numeric(x[1]),x[2])})
	estimates.average$lcl[is.na(estimates.average$lcl)]=estimates.average$estimate[is.na(estimates.average$lcl)]
	estimates.average$ucl[is.na(estimates.average$ucl)]=estimates.average$estimate[is.na(estimates.average$ucl)]
}
else
{
	estimates.average$lcl=mavg.res$lcl
	estimates.average$ucl=mavg.res$ucl
}
return(list(estimates=estimates.average,vcv=vcv.real,reals=reals))
}

Try the RMark package in your browser

Any scripts or data that you put into this service are public.

RMark documentation built on Aug. 14, 2022, 1:05 a.m.