pcat: Reduction for the Levels of a Factor.
In gamlss: Generalized Additive Models for Location Scale and Shape

View source: R/p-categorical.R

pcat	R Documentation

Reduction for the Levels of a Factor.

Description

The function is trying to merged similar levels of a given factor. Its based on ideas given by Tutz (2013).

Usage

pcat(fac, df = NULL, lambda = NULL, method = c("ML", "GAIC"), start = 0.001, 
         Lp = 0, kappa = 1e-05, iter = 100, c.crit = 1e-04, k = 2)

gamlss.pcat(x, y, w, xeval = NULL, ...)

plotDF(y, factor = NULL, formula = NULL, data, along = seq(0, nlevels(factor)), 
         kappa = 1e-06, Lp = 0, ...)

plotLambda(y, factor = NULL, formula = NULL, data, along = seq(-2, 2, 0.1), 
         kappa = 1e-06, Lp = 0, ...)

Arguments

`fac`, `factor`	a factor to reduce its levels
`df`	the effective degrees of freedom df
`lambda`	the smoothing parameter
`method`	which method is used for the estimation of the smoothing parameter, `"ML"` or `"GAIC"` are allowed.
`start`	starting value for `lambda` if it estimated using `"ML"` or `"GAIC"`
`Lp`	The type of penalty required, `Lp=0` is the default. Use `Lp=1` for lasso type and different values for different required penalty.
`kappa`	a regulation parameters used for the weights in the penalties.
`iter`	the number of internal iteration allowed
`c.crit`	the convergent criterion
`k`	the penalty if `"GAIC"` method is used.
`x`	explanatory factor
`y`	the response or iterative response variable
`w`	iterative weights
`xeval`	indicator whether to predict
`formula`	A formula
`data`	A data frame
`along`	a sequence of values
`...`	for extra variables

Details

The pcat() is used for the fitting of the factor. The function shrinks the levels of the categorical factor (not towards the overall mean as the function random() is doing) but towards each other. This results to a reduction of the number if levels of the factors. Different norms can be used for the shrinkage by specifying the argument Lp.

Value

The function pcat reruns a vector endowed with a number of attributes. The vector itself is used in the construction of the model matrix, while the attributes are needed for the backfitting algorithms additive.fit(). The backfitting is done in gamlss.pcat.

Note

Note that pcat itself does no smoothing; it simply sets things up for gamlss.pcat() to do the smoothing within the backfitting.

Author(s)

Mikis Stasinopoulos, Paul Eilers and Marco Enea

References

Tutz G. (2013) Regularization and Sparsity in Discrete Structures in the Proceedings of the 29th International Workshop on Statistical Modelling, Volume 1, p 29-42, Gottingen, Germany

Rigby, R. A., Stasinopoulos, D. M., Heller, G. Z., and De Bastiani, F. (2019) Distributions for modeling location, scale, and shape: Using GAMLSS in R, Chapman and Hall/CRC. An older version can be found in https://www.gamlss.com/.

Stasinopoulos D. M. Rigby R.A. (2007) Generalized additive models for location scale and shape (GAMLSS) in R. Journal of Statistical Software, Vol. 23, Issue 7, Dec 2007, https://www.jstatsoft.org/v23/i07/.

Stasinopoulos D. M., Rigby R.A., Heller G., Voudouris V., and De Bastiani F., (2017) Flexible Regression and Smoothing: Using GAMLSS in R, Chapman and Hall/CRC.

(see also https://www.gamlss.com/).

Examples

# Simulate data 1
    n <- 10  # number of levels 
    m <- 200 # number of observations  
set.seed(2016)
level <-  as.factor(floor(runif(m) * n) + 1)
  a0  <-  rnorm(n)
sigma <-  0.4
   mu <-  a0[level]
   y <-  mu + sigma * rnorm(m)
plot(y~level)
points(1:10,a0, col="red")
 da1 <- data.frame(y, level)
#------------------
  mn <- gamlss(y~1,data=da1 ) # null model 
  ms <- gamlss(y~level-1, data=da1) # saturated model 
  m1 <- gamlss(y~pcat(level), data=da1) # calculating lambda ML
AIC(mn, ms, m1)
## Not run: 
m11 <- gamlss(y~pcat(level, method="GAIC", k=log(200)), data=da1) # GAIC
AIC(mn, ms, m1, m11) 
#gettng the fitted object -----------------------------------------------------
getSmo(m1)
coef(getSmo(m1))
fitted(getSmo(m1))[1:10]
plot(getSmo(m1)) # 
# After the fit a new factor is created  this factor has the reduced levels
 levels(getSmo(m1)$factor)
# -----------------------------------------------------------------------------

## End(Not run)

gamlss documentation built on Aug. 21, 2025, 5:56 p.m.