bandwidth: Automatic selection of the bandwidth parameter 'h'

View source: R/bandwidth.R

bandwidthR Documentation

Automatic selection of the bandwidth parameter h

Description

This functions implements the minimization of the combined penalty function described by Holland and Thayer (1989); Von Davier et al, (2004). It returns the optimal value of h for kernel continuization, according to the above mentioned criteria. Different types of kernels (others than the gaussian) are accepted.

Usage

bandwidth(scores, kert, degree, design, Kp = 1, scores2, degreeXA, degreeYA, 
J, K, L, wx, wy, w)

Arguments

Note that depending on the specified equating design, not all arguments are necessary as detailed below.

scores

If the "EG" design is specified, a vector containing the raw sample frequencies coming from one group taking the test.

If the "SG" design is specified, a matrix containing the (joint) bivariate sample frequencies for X (raws) and Y (columns).

If the "CB" design is specified, a two column matrix containing the observed scores of the sample taking test X first, followed by test Y. The scores2 argument is then used for the scores of the sample taking test Y first followed by test X.

If either the "NEAT_CB" or "NEAT_PSE" design is selected, a two column matrix containing the observed scores on test X (first column) and the observed scores on the anchor test A (second column). The scores2 argument is then used for the observed scores on test Y.

kert

A character string giving the type of kernel to be used for continuization. Current options include "gauss", "logis", and "uniform" for the gaussian, logistic and uniform kernels, respectively

degree

Either a number or vector indicating the number of power moments to be fitted to the marginal distributions, or the number or cross moments to be fitted to the joint distributions, respectively. For the "EG" design it will be a number (see Details).

design

A character string indicating the equating design (one of "EG", "SG", "CB", "NEAT_CE", "NEAT_PSE")

Kp

A number which acts as a weight for the second term in the combined penalization function used to obtain h (see details).

scores2

Only used for the "CB", "NEAT_CE" and "NEAT_PSE" designs. See the description of scores.

degreeXA

A vector indicating the number of power moments to be fitted to the marginal distributions X and A, and the number or cross moments to be fitted to the joint distribution (X,A) (see details). Only used for the "NEAT_CE" and "NEAT_PSE" designs.

degreeYA

Only used for the "NEAT_CE" and "NEAT_PSE" designs (see the description for degreeXA)

J

The number of possible X scores. Only needed for "CB", "NEAT_CB" and "NEAT_PSE" designs

K

The number of possible Y scores. Only needed for "CB", "NEAT_CB" and "NEAT_PSE" designs

L

The number of possible A scores. Needed for "NEAT_CB" and "NEAT_PSE" designs

wx

A number that satisfies 0<=w_x<=1 indicating the weight put on the data that is not subject to order effects. Only used for the "CB" design.

wy

A number that satisfies 0<=w_y<=1 indicating the weight put on the data that is not subject to order effects. Only used for the "CB" design.

w

A number that satisfies 0<=w<=1 indicating the weight given to population P. Only used for the "NEAT" design.

Details

To automatically select h, the function minimizes

PEN_1(h)+K*PEN_2(h),

where PEN_1(h)=∑_j(\hat{r}_j-\hat{f}_h(x_j))^2, and PEN_2(h)=∑_jA_j(1-B_j). The terms A and B are such that PEN_2 acts as a smoothness penalty term that avoids rapid fluctuations in the approximated density (see Chapter 10 in Von Davier, 2011 for more details). The K term corresponds to the Kp argument of the bandwidth function. The \hat{r} values are assumed to be estimated by polynomial loglinear models of specific degree, which come from a call to loglin.smooth.

Value

A number which is the optimal value of h.

Author(s)

Jorge Gonzalez jorge.gonzalez@mat.uc.cl

References

Gonzalez, J. (2014). SNSequate: Standard and Nonstandard Statistical Models and Methods for Test Equating. Journal of Statistical Software, 59(7), 1-30.

Von Davier, A., Holland, P., and Thayer, D. (2004). The Kernel Method of Test Equating. New York, NY: Springer-Verlag.

A. von Davier (Ed.) (2011). Statistical Models for Equating, Scaling, and Linking. New York: Springer

See Also

loglin.smooth

Examples

#Example: The "Standard" column and firsts two rows of Table 10.1 in 
#Chapter 10 of Von Davier 2011

data(Math20EG)

hx.logis<-bandwidth(scores=Math20EG[,1],kert="logis",degree=2,design="EG")$h
hx.unif<-bandwidth(scores=Math20EG[,1],kert="unif",degree=2,design="EG")$h 
hx.gauss<-bandwidth(scores=Math20EG[,1],kert="gauss",degree=2,design="EG")$h

hy.logis<-bandwidth(scores=Math20EG[,2],kert="logis",degree=3,design="EG")$h
hy.unif<-bandwidth(scores=Math20EG[,2],kert="unif",degree=3,design="EG")$h 
hy.gauss<-bandwidth(scores=Math20EG[,2],kert="gauss",degree=3,design="EG")$h

partialTable10.1<-rbind(c(hx.logis,hx.unif,hx.gauss),
				c(hy.logis,hy.unif,hy.gauss))

dimnames(partialTable10.1)<-list(c("h.x","h.y"),c("Logistic","Uniform","Gaussian"))
partialTable10.1


SNSequate documentation built on Dec. 28, 2022, 1:35 a.m.