Sbinsmth: Estimate the option probability and surprisal curves.

View source: R/Sbinsmth.R

SbinsmthR Documentation

Estimate the option probability and surprisal curves.


The surprisal curves for each item are fit to the surprisal transforms of choice probabilities for each of a set of bins of current performance values index. The error sums of squares are minimized by the surprisal optimization smooth.surp in the fda package. The output is a list vector of length n containing the functional data objects defining the curves.


  Sbinsmth(index, dataList, indexQnt=seq(0,100, len=2*nbin+1), 
           wtvec=matrix(1,n,1), iterlim=20, conv=1e-4, dbglev=0)



A vector of length N containing current values of score index percentile values.


A list that contains the objects needed to analyse the test or rating scale.


A vector of length 2*n+1 containing the sequence of bin boundary and bin centre values.


A vector of length n of weights on observations. Defaults to all ones.


The maximum number of iterations used in optimizing surprisal curves. Defaults to 20.


Convergence tolerance. Defaults to 0.0001.


Level of output within Sbinsmth. If 0, no output, if 1 the error sum of squares and slope on each iterations, and if 2 or higher, results for each line search iteration with function lnsrch.


The function first bins the data in order to achieve rapid estimation of the option surprisal curves. The argument indexQnt contains the sequence of bin boundaries separated by the bin centers, so that it is of length 2*nbin + 1 where nbin is the number of bins. These bin values are distributed over the percentile interval [0,100] so that the lowest boundary is 0 and highest 100. Prior to the call to Sbinsmth these boundaries are computed so that the numbers of values of index falling in the bins are roughly equal. It is important that the number of bins be chosen so that the bins contain at least about 25 values.

After the values of index are binned, the proportions that the bins are chosen for each question and each option are computed. Proportions of zero are given NA values.

The positive proportions are then converted to surprisal values where surprisal = -log_M (proportion) where log_M is the logarithm with base M, the number of options associated with a question. Bins with zero proportions are assigned a surprisal that is appropriately large in the sense of being in the range of the larger surprisal values associated with small but positive proportions. This surprisal value is usually about 4.

The next step is to fit the surprisal values for each question by a functional data object that is smooth, passes as closely as possible to an option's surprisal values, and has values consistent with being a surprisal value. The function smooth.surp() is used for this purpose. The arc length of thme item information curve is also computed.

Finally the curves and other results for each question are saved in object SfdList, a list vector of length n, and the list vector is returned.


The optimized numbered list object SfdList with length n that provides data on the probability and surprisal data and curves. The 12 objects for each item are as follows:


A surprisal functional data object that is used for plotting. It also contains the coefficient matrix and functional data basis that define the object.


The number of options, including if needed a final option which is for the missing and illegitimate responses.


A nbin by M matrix of proportions of choice for each option.


A nbin by M matrix of surprisal values for each option..


A fine mesh of 101 equally spaced score index values over the interval [0,1].


A 101 by M matrix of probability values at each of the fine mesh points indfine.


A 101 by M matrix of surprisal values at each of the fine mesh points indfine.


A 101 by M matrix of surprisal first derivative values at each of the fine mesh points indfine.


A 101 by M matrix of surprisal second derivative values at each of the fine mesh points indfine.


The standard error for probability over the fine mesh.


The standard error for surprisal over the fine mesh.


The length of the item info curve.


Juan Li and James Ramsay


Ramsay, J. O., Li J. and Wiberg, M. (2020) Full information optimal scoring. Journal of Educational and Behavioral Statistics, 45, 297-315.

Ramsay, J. O., Li J. and Wiberg, M. (2020) Better rating scale scores with information-based psychometrics. Psych, 2, 347-360.

See Also

ICC_plot, Sbinsmth


#  Example 1.  Display the initial probability and surprisal curves for the 
#  first item in the short SweSAT multiple choice test with 24 items and 
#  1000 examinees.
#  Note: The scope is 0 at this point because it is computed later 
#  in the analysis.
dataList <- Quant_13B_problem_dataList
index    <- dataList$percntrnk
#  Carry out the surprisal smoothing operation
SfdResult   <- Sbinsmth(index, dataList)
  ## Not run: 
  #  Set up the list object for the estimated surprisal curves
  SfdList     <- SfdResult$SfdList
  #  The five marker percentage locations for (5, 25, 50, 75, 95)
  binctr      <- dataList$binctr
  Qvec        <- dataList$PcntMarkers
  #  plot the curves for the first question
  scrfine   <- seq(0,100,len=101)
  ICC_plot(scrfine, SfdList, dataList, Qvec, binctr,
            data_point = TRUE, plotType = c("S", "P"), 
            Srng=c(0,3), plotindex=1)
## End(Not run)

TestGardener documentation built on May 29, 2024, 3:31 a.m.