README.md

CRAN_Status_Badge CRAN RStudio mirror downloads

cNorm

The package cNorm provides methods for generating continuous standard scores, as f. e. for psychometric test development, biometrics (e. g. biological and physiological growth curves), and screenings in the medical domain. It is based on the approach suggested by A. Lenhard et al. (2016, 2019). For an in-depth tutorial please consult the project homepage https://www.psychometrica.de/cNorm_en.html and https://cnorm.shinyapps.io/cNORM/ for an online demonstration.

Approach

Conventional methods for producing test norms are often plagued with "jumps" or "gaps" (i.e., discontinuities) in norm tables and low confidence for assessing extreme scores. cNORM addresses these problems and also has the added advantage of not requiring assumptions about the distribution of the raw data: The standard scores are established from raw data by modeling the latter ones as a function of both percentile scores and an explanatory variable (e.g., age) through Taylor polynomials. The method minimizes bias arising from sampling and measurement error, while handling marked deviations from normality – such as are commonplace in clinical samples. It includes procedures for post stratification of norm samples to overcome bias in data collection and to mitigate violations of representativeness. Contrary to parametric approaches, it does not rely on distribution assumptions of the initial norm data and is thus a very robust approach in generating norm tables.

The rationale of the approach is model the relationship between location / norm score, age and raw score via multiple regression and to fit a 3-dimensional hyperplane. This hyperplane is used to close all gaps and to compute continuous norm scores:

Installation

cNORM can be installed via ```{r example} install.packages("cNORM", dependencies = TRUE)


Additionally, you can [download a precompiled version](https://github.com/WLenhard/cNORM/releases) or access the github development version via
```{r example}
install.packages("devtools")
library(devtools)

devtools::install_github("WLenhard/cNORM")
library(cNORM)

Please report errors. Suggestions for improvement are always welcome!

Example

Conducting the analysis consists of the following steps: 1. Data preparation and establishing the regression model 1. Validating the model 1. Generating norm tables and plotting the results

cNORM offers functions for selecting the best fitting models and in generating the norm tables.

```{r example}

In a nutshell:

Basic example code for modeling the sample dataset

library(cNORM)

Start the graphical user interface (needs shiny installed)

The GUI includes the most important functions. For specific cases,

please use cNORM on the console.

cNORM.GUI()

Easy start: Conventional norming for one group without continuum over age

with the inbuilt elfe dataset.

cnorm(raw = elfe$raw)

Rank data within group and compute powers and interactions for the internal

dataset 'elfe' and compute model. The resulting object includes the ranked

data via object$data and model via object$model.

cnorm.elfe <- cnorm(raw = elfe$raw, group = elfe$group)

Plot R2 of different model solutions in dependence of the number of predictors

plot(cnorm.elfe, "subset", type=0) # plot R2 plot(cnorm.elfe, "subset", type=3) # plot MSE

NOTE! At this point, you usually select a good fitting model and rerun the process

with a fixed number of terms, e. g. four. Avoid models with a high number of terms:

cnorm.elfe <- cnorm(raw = elfe$raw, group = elfe$group, terms = 4)

Per default, the power parameter is set to k = 5 and t = 3. You can choose a value up

to 6, but higher values can lead to overfit. In case of overfit, please reduce these

values. In case, only k is specified, cNORM uses this value for both k and t.

In the following example, the distribution per age is modeled with power parameter

k = 3 (= cubic), while for the age, there is only a quadratic trajectory (-> 't = 2').

cnorm.elfe <- cnorm(raw = elfe$raw, group = elfe$group, k = 3, t = 2)

Visual inspection of the percentile curves of the fitted model

plot(cnorm.elfe, "percentiles")

Visual inspection of the observed and fitted raw and norm scores

plot(cnorm.elfe, "norm") plot(cnorm.elfe, "raw") plot(cnorm.elfe, "raw", group = "group") # show fit per grouping variable

In order to check, how other models perform, plot series of percentile plots with ascending

number of predictors, in this example up to 14 predictors.

plot(cnorm.elfe, "series", end=14)

Cross validation of number of terms with 20% of the data for validation and 80% training.

Due to the time intensity, max terms is restricted to 10 in this example; 3 repetitions

cnorm.cv(cnorm.elfe$data, max=10, repetitions=3)

Cross validation with pre-specified terms, e. g. of an already existing model

cnorm.cv(cnorm.elfe, repetitions=3)

Print norm table (for grade 3, 3.2, 3.4, 3.6)

normTable(c(3, 3.2, 3.4, 3.6), cnorm.elfe)

The other way round: Print raw table (for grade 3) together with 90% confidence intervalls

for a test with a reliability of .94

rawTable(3, cnorm.elfe, CI = .9, reliability = .94)

Get the predicted norm scores for a vector of raw scores and explanatory variable, e. g. age

predicted <- predictNorm(elfe$raw, elfe$group, cnorm.elfe)

In case of unbalanced datasets deviating from the census, the norm data

can be weighted by the means of raking / post stratification. Please generate

the weights with the computeWeights() function and pass them as the weights

parameter. For computing the weights, please specify a data.frame with the

population margins (further information is available in the computeWeights

function). A demonstration based on sex and migration status in vocabulary

development (ppvt dataset; Gary et al., 2023a, 2023b):

margins <- data.frame(variables = c("sex", "sex", "migration", "migration"), levels = c(1, 2, 0, 1), share = c(.52, .48, .7, .3)) weights <- computeWeights(ppvt, margins) model <- cnorm(raw = ppvt$raw, group=ppvt$group, weights = weights)

start vignette for a complete walk through

vignette("cNORM-Demo", package = "cNORM") vignette("WeightedRegression", package = "cNORM") ``` cNORM offers functions to choose the optimal model, both from a visual inspection of the percentiles, as well as by information criteria and model tests:

In this example, a Taylor polynomial with power k = 4 was computed in order to model a sample of the ELFE 1-6 reading comprehension test (sentence completion task; W. Lenhard & Schneider, 2006). In the plot, you can see the share of variance explained by the different models (with progressing number of predictors). Adjusted R2, Mallow's Cp (an AIC like measure) and BIC is used (BIC is available through the option type = 2). The predefined adjusted R2 value of .99 is already reached with the third model and afterwards we only get minor improvements in adjusted R2. On the other hand, Cp rapidly declines afterwards, so model 3 seems to be a good candidate in terms of the relative information content per predictor and the captured information (adjusted R2). It is advisable to choose a model at the "elbow" in order to avoid over-fitting, but the solution should be tested for violations of model assumptions and the progression of the percentiles should be inspected visually, as well.

The predicted progression over age are displayed as lines and the manifest data as dots. Only three predictors were necessary to almost perfectly model the norm sample data with adjusted R2.

Sample Data

The package includes data from two large test norming projects, namely ELFE 1-6 (Lenhard & Schneider, 2006) and German adaption of the PPVT4 (A. Lenhard, Lenhard, Suggate & Seegerer, 2015), which can be used to run the analysis. Furthermore, large samples from the Center of Disease Control (CDC) on growth curves in childhood and adolescence (for computing Body Mass Index 'BMI' curves), life expectancy at birth and mortality per country from 1960 to 2017 (available from The World Bank). Type ?elfe, ?ppvt, ?CDC, ?epm, ?mortality or ?life to display information on the data sets.

Terms of use, license and declaration of interest

cNORM is licensed under GNU Affero General Public License v3 (AGPL-3.0). This means that copyrighted parts of cNORM can be used free of charge for commercial and non-commercial purposes that run under this same license, retain the copyright notice, provide their source code and correctly cite cNORM. Copyright protection includes, for example, the reproduction and distribution of source code or parts of the source code of cNORM or of graphics created with cNORM. The integration of the package into a server environment in order to access the functionality of the software (e.g. for online delivery of norm scores) is also subject to this license. However, a regression function determined with cNORM is not subject to copyright protection and may be used freely without preconditions. If you want to apply cNORM in a way that is not compatible with the terms of the AGPL 3.0 license, please do not hesitate to contact us to negotiate individual conditions. If you want to use cNORM for scientific publications, we would also ask you to quote the source.

The authors would like to thank WPS (https://www.wpspublish.com/) for providing funding for developing, integrating and evaluating weighting and post stratification in the cNORM package. The research project was conducted in 2022.

References



WLenhard/cNORM documentation built on April 28, 2024, 4:24 a.m.