R/HybridEquating.R

Defines functions HybridEquate

Documented in HybridEquate

#' Hybrid Levine-PSE equating.
#'
#' This function applies equating using the hybrid method described by von Davier and Chen (2013).
#' Like Levine equating, it attempts to account for measurement error in the anchor test whilst equating.
#' However, unlike Leveine equating, it is also designed to retain the nonlinear nature of equipercentile equating.
#' The last of the examples given below provides a situation where we might expect this method to be superior to
#' both chained and post-stratification equating. 
#'
#' @param dx Data frame with variables "x" and "a" representing scores for individual candidates on form X and on the anchor test.
#' @param dy Data frame with variables "y" and "a" representing scores for individual candidates on form Y and on the anchor test.
#' @param target A character denoting the synthetic population. Can be "x" to denote weighting to the form X population, "y" to denote weighting to the form Y population or left blank (default) to indicate the data should be weighted to the average across all data.
#' @param order Numeric (integer) input denoting the order up to which the anchor distributions should be matched across populations during the PSE step. For example, 2 would indicate that the means and the means of the squared values (related to the standard deviation) should match. An order of 4 essentially indicates that the means, SDs, skewness and kurtosis of anchor test scores should be matched. The default is 5. Setting this to a lower value (2, 3 or 4) may be useful in the event of error messages (often related to non-convergence). Higher values may be valuable if the distributions of anchor test scores follow an extremely different shape on the two populations.
#' @param internal Logical value denoting whether an internal or an external anchor test is being used. The default (FALSE) assumes that it is an external anchor test.
#' @param hX Bandwidth for form X used during the PSE step. By default a plug-in estimator from Andersson and von Davier (2014) is used.
#' @param hY Bandwidth for form Y used during the PSE step. By default a plug-in estimator from Andersson and von Davier (2014) is used.
#'
#' @return The function returns a list with the following elements:
#' \describe{
#'   \item{hybridFunc}{A function that translates any vector of scores on form X into equivalent scores on form Y.}
#'   \item{EqTable}{A data frame combining the sorted unique scores on form X in the data and their equated values on form Y.}
#' }
#' @references
#' Andersson, B., & von Davier, A. A. (2014). Improving the bandwidth selection in kernel equating. 
#' \emph{Journal of Educational Measurement, 51}(3), 223-238.
#'
#' von Davier, A. A., & Chen, H. (2013). The Kernel Levine Equipercentile Observed-Score Equating Function. ETS Research Report Series.
#' 
#' @examples
#' #example using some real data (but no criterion equate to compare to)
#' dx2=data.frame(x=rowSums(mathsdata[1:250,1:35]),a=rowSums(mathsdata[1:250,41:50]))
#' summary(dx2)
#' dy2=data.frame(y=rowSums(mathsdata[251:500,51:90]),a=rowSums(mathsdata[251:500,41:50]))
#' summary(dy2)
#' hybrideq2=HybridEquate(dx2, dy2,target="y")$EqTable
#' chainedeq2=KernelChainedEquate(dx2, dy2)$EqTable
#' plot(chainedeq2$x,chainedeq2$equiyx,type='l')#chained equating function
#' lines(hybrideq2$x,hybrideq2$hybridyx,lty=2)#Hybrid equating estimated equating function
#' 
#' #demonstrate point of hybrid equating on a 30 item test with an internal 5 item anchor
#' #define 30 rasch item difficulties as equally spread
#' itedifs=rep(seq(-2,2,length=5),6)
#' #set up ensures that last 5 items (anchor) are minitest of first 25
#' 
#' #simulate population one item scores (and then form scores)
#' n1=3000
#' t1=rnorm(n1,0.5,1)
#' ites1=0+(plogis(t1%*%t(rep(1,30))-rep(1,length(t1))%*%t(itedifs))>matrix(runif(n1*30),nrow=n1))
#' scoresX1=rowSums(ites1[,1:30])
#' scoresA1=rowSums(ites1[,26:30])
#' #simulate parallel tests in population two
#' n2=3000
#' t2=rnorm(n2,0,1)
#' ites2=0+(plogis(t2%*%t(rep(1,30))-rep(1,length(t2))%*%t(itedifs))>matrix(runif(n2*30),nrow=n2))
#' scoresY2=rowSums(ites2[,1:30])
#' scoresA2=rowSums(ites2[,26:30])
#' 
#' chainedeq3=KernelChainedEquate(data.frame(x=scoresX1,a=scoresA1),data.frame(y=scoresY2,a=scoresA2))$EqTable
#' pseeq3=PSEObservedEquate(data.frame(x=scoresX1,a=scoresA1),data.frame(y=scoresY2,a=scoresA2))$EqTable
#' hybrideq3=HybridEquate(data.frame(x=scoresX1,a=scoresA1),data.frame(y=scoresY2,a=scoresA2),internal=TRUE)$EqTable
#' plot(chainedeq3$x,chainedeq3$equiyx,type='l')#chained equating function
#' lines(pseeq3$x,pseeq3$equiyx,lty=3,col="red")#PSE estimated equating function
#' lines(hybrideq3$x,hybrideq3$hybridyx,lty=2)#Hybrid equating estimated equating function
#' lines(0:30,0:30,col="blue",lty=3,lwd=4)#true equating function (identity as set up to be parallel tests)
#' 
#' @export
HybridEquate=function(dx,dy,target=NA,order=5,internal=FALSE,hX=NA,hY=NA){
ws=c(dim(dx)[1],dim(dy)[1])
ws=ws/sum(ws)
if(!is.na(target)){
	if(target=="y"){ws=c(0,1)}
	if(target=="x"){ws=c(1,0)}
	}

LE=LevineObservedEquate(dx,dy, ws,internal=internal)
PSE=PSEObservedEquate(dx,dy, target = "y",order=order,hX=hX,hY=hY)

EquiFunc=function(scores1){
	xs1star=PSE$muSx+(PSE$sigSx/LE$sigSx)*(scores1-LE$muSx)
	LE$lys(scores1)+(LE$sigSy/PSE$sigSy)*(PSE$KerEquiFunc(xs1star)-PSE$KerLinFunc(xs1star))}

EquiTable=data.frame(x = sort(unique(dx$x)),hybridyx=EquiFunc(sort(unique(dx$x))))
return(list(hybridFunc=EquiFunc,EqTable=EquiTable))
}
CambridgeAssessmentResearch/KernEqWPS documentation built on Feb. 23, 2024, 9:34 p.m.