# presmoothing: Frequency Distribution Presmoothing In equate: Observed-Score Linking and Equating

## Description

These functions are used to smooth frequency distributions.

## Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17``` ```presmoothing(x, ...) ## Default S3 method: presmoothing(x, smoothmethod = c("none", "average", "bump", "loglinear"), jmin, asfreqtab = TRUE, ...) ## S3 method for class 'formula' presmoothing(x, data, ...) loglinear(x, scorefun, degrees = list(4, 2, 2), grid, rmimpossible, asfreqtab = TRUE, models, stepup = !missing(models), compare = FALSE, choose = FALSE, choosemethod = c("chi", "aic", "bic"), chip, verbose = FALSE, ...) freqbump(x, jmin = 1e-06, asfreqtab = FALSE, ...) freqavg(x, jmin = 1, asfreqtab = FALSE, ...) ```

## Arguments

 `x` either an object of class “`freqtab`” specifying a univariate or multivariate score distribution, or a “`formula`” object. `smoothmethod` character string indicating the smoothing method to be used by `presmoothing`. `"none"` returns unsmoothed frequencies, `"bump"` adds a small frequency to each score value, `"average"` imputes small frequencies with average values, and `"loglinear"` fits loglinear models. See below for details. `jmin` for `smoothmethod = "average"`, the minimum frequency, as an integer, below which frequencies will be replaced (default is 1). for `smoothmethod = "bump"`, the value to be added to each score point (as a probability, with default 1e-6). `asfreqtab` logical, with default `TRUE`, indicating whether or not a frequency table should be returned. For ```smoothmethod = "average"``` and `smoothmethod = "bump"`, the alternative is a vector of frequencies. For `loglinear`, there are other options. `data` an object of class “`freqtab`”. `scorefun` matrix of score functions used in loglinear presmoothing, where each column includes a transformation of the score scale or interactions between score scales. If missing, `degrees` and `xdegree` will be used to construct polynomial score functions. `degrees` list of integer vectors, each one indicating the maximum polynomial score transformations to be computed for each variable at a given order of interactions. Defaults (`degrees = list(4, 2, 2)`) are provided for up to trivariate interactions. `degrees` are ignored if `scorefun` or `grid` are provided. See below for details. `grid` matrix with one column per margin in `x` and one row per term in the model. See below for details. `rmimpossible` integer vector indicating columns in `x` to be used in removing impossible scores before smoothing, assuming internal anchor variables. Impossible scores are kept by default. See below. `models` integer vector indicating which model terms should be grouped together when fitting multiple nested models. E.g., ```models = c(1, 1, 2, 3)``` will compare three models, with the first two terms in model one, the third term added in model two, and the fourth in model three. `stepup` logical, with default `FALSE`, indicating whether or not multiple nested models should be automatically fit. If `TRUE` and `models` is missing, an attempt will be made to create it using `grid` and/or `degrees`. Otherwise, in the absence of `models`, each column in `scorefun` will define a new sequential model. `compare` logical, with default `FALSE`, indicating whether or not fit for nested models should be compared. If `TRUE`, `stepup` is also set to `TRUE` and only results from the model fit comparison are returned, that is, `verbose` is ignored. `choose` logical, with default `FALSE`, indicating whether or not the best-fitting model should be returned after comparing fit of nested models. Useful for automating model selection in simulations. `choosemethod` string, indicating the method for selecting a best-fitting model when `choose = TRUE`. `"chi"` selects the most complex model with chi-square p-value below the criterion in `chip`. Remaining methods choose the model with lowest value. `chip` proportion specifying the type-I error rate for model selection based on `choosemethod = "chi"`. `verbose` logical, with default `FALSE`, indicating whether or not full `glm` output should be returned. `...` further arguments passed to other methods. For `presmoothing`, these are passed to `loglinear` and include those listed above.

## Details

Loglinear smoothing is a flexible procedure for reducing irregularities in a frequency distribution prior to equating, where the degree of each polynomial term determines the specific moment of the observed distribution that is preserved in the fitted distribution (see below for examples). The `loglinear` function is a wrapper for `glm`, and is used to simplify the creation of polynomial score functions and the fitting and comparing of multiple loglinear models.

`scorefun`, if supplied, must contain at least one score function of the scale score values. Specifying a list to `degrees` is an alternative to supplying `scorefun`. Each list element in `degrees` should be a vector equal in length to the number of variables contained in `x`; there should also be one such vector for each possible level of interaction between the variables in `x`.

For example, the default `degrees = list(4, 2, 2)` is recycled to produce `list(c(4, 4, 4), c(2, 2, 2), c(2, 2, 2))`, resulting in polynomials to the fourth power for each univariate distribution, to the second power for each two-way interaction, and to the second power for the three-way interaction.

Terms can also be specified with `grid`, which is a matrix with each row containing integers specifying the powers for each variable at each interaction term, including main effects. For example, the main effect to the first power for the total score in a bivariate distribution would be `c(1, 0)`; the interaction to the second power would be `c(2, 2)`.

`stepup` is used to run nested models based on subsets of the columns in `scorefun`. Output will correspond to models based on columns 1 and 2, 1 through 3, 1 through 4, to 1 through `ncol(scorefun)`. This list of polynomial terms is then used to create a `grid` using `expand.grid`. The `grid` can also be supplied directly, in which case `degrees` will be ignored.

`compare` returns output as an `anova` table, comparing model fit for all the models run with `stepup = TRUE`, or by specifying more than one model in `models`. When `choose = TRUE`, the arguments `choosemethod` and `chip` are used to automatically select the best-fitting model based on the `anova` table from running `compare`.

The remaining smoothing methods make adjustments to scores with low or zero frequencies. `smoothmethod = "bump"` adds the proportion `jmin` to each score point and then adjusts the probabilities to sum to 1. `smoothmethod = "average"` replaces frequencies falling below the minimum `jmin` with averages of adjacent values.

## Value

When `smoothmethod = "average"` or ```smoothmethod = "bump"```, either a smoothed frequency vector or table is returned. Otherwise, `loglinear` returns the following:

 when `compare = TRUE`, an anova table for model fit when `asfreqtab = TRUE`, a smoothed frequency table when `choose = TRUE`, a smoothed frequency table with attribute "anova" containing the model fit table for all models compared when `verbose = TRUE`, full `glm` output, for all nested models when `stepup = TRUE` when `stepup = TRUE` and `verbose = FALSE`, a `data.frame` of fitted frequencies, with one column per model

## Methods (by class)

• `default`: Default method for frequency tables.

• `formula`: Method for “`formula`” objects.

## Author(s)

Anthony Albano [email protected]

## References

Holland, P. W., and Thayer, D. T. (1987). Notes on the use of log-linear models for fitting discrete probability distributions (PSR Technical Rep. No. 87-79; ETS RR-87-31). Princeton, NJ: ETS.

Holland, P. W., and Thayer, D. T. (2000). Univariate and bivariate loglinear models for discrete test score distributions. Journal of Educational and Behavioral Statistics, 25, 133–183.

Moses, T., and Holland, P. W. (2008). Notes on a general framework for observed score equating (ETS Research Rep. No. RR-08-59). Princeton, NJ: ETS.

Moses, T., and Holland, P. W. (2009). Selection strategies for univariate loglinear smoothing models and their effect on equating function accuracy. Journal of Educational Measurement, 46, 159–176. ETS.

Wang, T. (2009). Standard errors of equating for the percentile rank-based equipercentile equating with log-linear presmoothing. Journal of Educational and Behavioral Statistics, 34, 7–23.

`glm`, `loglin`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57``` ```set.seed(2010) x <- round(rnorm(1000, 100, 15)) xscale <- 50:150 xtab <- freqtab(x, scales = xscale) # Adjust frequencies plot(xtab, y = cbind(average = freqavg(xtab), bump = freqbump(xtab))) # Smooth x up to 8 degrees and choose best fitting model # based on aic minimization xlog1 <- loglinear(xtab, degrees = 8, choose = TRUE, choosemethod = "aic") plot(xtab, as.data.frame(xlog1)[, 2], legendtext = "degree = 3") # Add "teeth" and "gaps" to x # Smooth with formula interface teeth <- c(.5, rep(c(1, 1, 1, 1, .5), 20)) xttab <- as.freqtab(cbind(xscale, c(xtab) * teeth)) xlog2 <- presmoothing(~ poly(total, 3, raw = TRUE), xttab, showWarnings = FALSE) # Smooth xt using score functions that preserve # the teeth structure (also 3 moments) teeth2 <- c(1, rep(c(0, 0, 0, 0, 1), 20)) xt.fun <- cbind(xscale, xscale^2, xscale^3) xt.fun <- cbind(xt.fun, teeth2, xt.fun * teeth2) xlog3 <- loglinear(xttab, xt.fun, showWarnings = FALSE) # Plot to compare teeth versus no teeth op <- par(no.readonly = TRUE) par(mfrow = c(3, 1)) plot(xttab, main = "unsmoothed", ylim = c(0, 30)) plot(xlog2, main = "ignoring teeth", ylim = c(0, 30)) plot(xlog3, main = "preserving teeth", ylim = c(0, 30)) par(op) # Bivariate example, preserving first 3 moments of total # and anchor for x and y, and the covariance # between anchor and total # see equated scores in Wang (2009), Table 4 xvtab <- freqtab(KBneat\$x, scales = list(0:36, 0:12)) yvtab <- freqtab(KBneat\$y, scales = list(0:36, 0:12)) Y <- as.data.frame(yvtab)[, 1] V <- as.data.frame(yvtab)[, 2] scorefun <- cbind(Y, Y^2, Y^3, V, V^2, V^3, V*Y) wang09 <- equate(xvtab, yvtab, type = "equip", method = "chained", smooth = "loglin", scorefun = scorefun) wang09\$concordance # Removing impossible scores has essentially no impact xvlog1 <- loglinear(xvtab, scorefun, asfreqtab = FALSE) xvlog2 <- loglinear(xvtab, scorefun, rmimpossible = 1:2) plot(xvtab, cbind(xvlog1, xvlog2 = as.data.frame(xvlog2)[, 3])) ```