bandwidth: Automatic selection of the bandwidth parameter 'h'
In SNSequate: Standard and Nonstandard Statistical Models and Methods for Test Equating

bandwidth

R Documentation

Automatic selection of the bandwidth parameter `h`

Description

This functions implements the minimization of the combined penalty function described by Holland and Thayer (1989); Von Davier et al, (2004). It returns the optimal value of h for kernel continuization, according to the above mentioned criteria. Different types of kernels (others than the gaussian) are accepted.

Usage

bandwidth(scores, kert, degree, design, Kp = 1, scores2, degreeXA, degreeYA, 
J, K, L, wx, wy, w, r=NULL)

Arguments

Note that depending on the specified equating design, not all arguments are necessary as detailed below.

`scores`	If the "EG" design is specified, a vector containing the raw sample frequencies coming from one group taking the test. If the "SG" design is specified, a matrix containing the (joint) bivariate sample frequencies for `X` (raws) and `Y` (columns). If the "CB" design is specified, a two column matrix containing the observed scores of the sample taking test `X` first, followed by test `Y`. The `scores2` argument is then used for the scores of the sample taking test Y first followed by test `X`. If either the "NEAT_CB" or "NEAT_PSE" design is selected, a two column matrix containing the observed scores on test `X` (first column) and the observed scores on the anchor test `A` (second column). The `scores2` argument is then used for the observed scores on test `Y`.
`kert`	A character string giving the type of kernel to be used for continuization. Current options include "`gauss`", "`logis`", and "`uniform`" for the gaussian, logistic and uniform kernels, respectively
`degree`	Either a number or vector indicating the number of power moments to be fitted to the marginal distributions, or the number or cross moments to be fitted to the joint distributions, respectively. For the "EG" design it will be a number (see Details).
`design`	A character string indicating the equating design (one of "EG", "SG", "CB", "NEAT_CE", "NEAT_PSE")
`Kp`	A number which acts as a weight for the second term in the combined penalization function used to obtain `h` (see details).
`scores2`	Only used for the "CB", "NEAT_CE" and "NEAT_PSE" designs. See the description of `scores`.
`degreeXA`	A vector indicating the number of power moments to be fitted to the marginal distributions `X` and `A`, and the number or cross moments to be fitted to the joint distribution `(X,A)` (see details). Only used for the "NEAT_CE" and "NEAT_PSE" designs.
`degreeYA`	Only used for the "NEAT_CE" and "NEAT_PSE" designs (see the description for `degreeXA`)
`J`	The number of possible `X` scores. Only needed for "CB", "NEAT_CB" and "NEAT_PSE" designs
`K`	The number of possible `Y` scores. Only needed for "CB", "NEAT_CB" and "NEAT_PSE" designs
`L`	The number of possible `A` scores. Needed for "NEAT_CB" and "NEAT_PSE" designs
`wx`	A number that satisfies `0\leq w_X\leq 1` indicating the weight put on the data that is not subject to order effects. Only used for the "CB" design.
`wy`	A number that satisfies `0\leq w_Y\leq 1` indicating the weight put on the data that is not subject to order effects. Only used for the "CB" design.
`w`	A number that satisfies `0\leq w\leq 1` indicating the weight given to population `P`. Only used for the "NEAT" design.
`r`	Score probabilities.

Details

To automatically select h, the function minimizes

PEN_1(h)+K\times PEN_2(h)

where PEN_1(h)=\sum_j(\hat{r}_j-\hat{f}_h(x_j))^2, and PEN_2(h)=\sum_jA_j(1-B_j). The terms A and B are such that PEN_2 acts as a smoothness penalty term that avoids rapid fluctuations in the approximated density (see Chapter 10 in Von Davier, 2011 for more details). The K term corresponds to the Kp argument of the bandwidth function. The \hat{r} values are assumed to be estimated by polynomial loglinear models of specific degree, which come from a call to loglin.smooth.

Value

A number which is the optimal value of h.

Author(s)

Jorge Gonzalez jorge.gonzalez@mat.uc.cl

References

Gonzalez, J. (2014). SNSequate: Standard and Nonstandard Statistical Models and Methods for Test Equating. Journal of Statistical Software, 59(7), 1-30.

Von Davier, A., Holland, P., and Thayer, D. (2004). The Kernel Method of Test Equating. New York, NY: Springer-Verlag.

A. von Davier (Ed.) (2011). Statistical Models for Equating, Scaling, and Linking. New York: Springer

Examples

#Example: The "Standard" column and firsts two rows of Table 10.1 in 
#Chapter 10 of Von Davier 2011

data(Math20EG)

hx.logis<-bandwidth(scores=Math20EG[,1],kert="logis",degree=2,design="EG")$h
hx.unif<-bandwidth(scores=Math20EG[,1],kert="unif",degree=2,design="EG")$h 
hx.gauss<-bandwidth(scores=Math20EG[,1],kert="gauss",degree=2,design="EG")$h

hy.logis<-bandwidth(scores=Math20EG[,2],kert="logis",degree=3,design="EG")$h
hy.unif<-bandwidth(scores=Math20EG[,2],kert="unif",degree=3,design="EG")$h 
hy.gauss<-bandwidth(scores=Math20EG[,2],kert="gauss",degree=3,design="EG")$h

partialTable10.1<-rbind(c(hx.logis,hx.unif,hx.gauss),
				c(hy.logis,hy.unif,hy.gauss))

dimnames(partialTable10.1)<-list(c("h.x","h.y"),c("Logistic","Uniform","Gaussian"))
partialTable10.1

SNSequate documentation built on May 29, 2024, 4:55 a.m.