pcat: Reduction for the Levels of a Factor.

Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/p-categorical.R

Description

The function is trying to merged similar levels of a given factor. Its based on ideas given by Tutz (2013).

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
pcat(fac, df = NULL, lambda = NULL, method = c("ML", "GAIC"), start = 0.001, 
         Lp = 0, kappa = 1e-05, iter = 100, c.crit = 1e-04, k = 2)

gamlss.pcat(x, y, w, xeval = NULL, ...)

plotDF(y, factor = NULL, formula = NULL, data, along = seq(0, nlevels(factor)), 
         kappa = 1e-06, Lp = 0, ...)

plotLambda(y, factor = NULL, formula = NULL, data, along = seq(-2, 2, 0.1), 
         kappa = 1e-06, Lp = 0, ...)

Arguments

fac, factor

a factor to reduce its levels

df

the effective degrees of freedom df

lambda

the smoothing parameter

method

which method is used for the estimation of the smoothing parameter, "ML" or "GAIC" are allowed.

start

starting value for lambda if it estimated using "ML" or "GAIC"

Lp

The type of penalty required, Lp=0 is the default. Use Lp=1 for lasso type and different values for different required penalty.

kappa

a regulation parameters used for the weights in the penalties.

iter

the number of internal iteration allowed

c.crit

the convergent criterion

k

the penalty if "GAIC" method is used.

x

explanatory factor

y

the response or iterative response variable

w

iterative weights

xeval

indicator whether to predict

formula

A formula

data

A data frame

along

a sequence of values

...

for extra variables

Details

The pcat() is used for the fitting of the factor. The function shrinks the levels of the categorical factor (not towards the overall mean as the function random() is doing) but towards each other. This results to a reduction of the number if levels of the factors. Different norms can be used for the shrinkage by specifying the argument Lp.

Value

The function pcat reruns a vector endowed with a number of attributes. The vector itself is used in the construction of the model matrix, while the attributes are needed for the backfitting algorithms additive.fit(). The backfitting is done in gamlss.pcat.

Note

Note that pcat itself does no smoothing; it simply sets things up for gamlss.pcat() to do the smoothing within the backfitting.

Author(s)

Mikis Stasinopoulos d.stasinopoulos@londonmet.ac.uk, Paul Eilers and Marco Enea

References

Tutz G. (2013) Regularization and Sparsity in Discrete Structures in the Proceedings of the 29th International Workshop on Statistical Modelling, Volume 1, p 29-42, Gottingen, Germany

Rigby, R. A., Stasinopoulos, D. M., Heller, G. Z., and De Bastiani, F. (2019) Distributions for modeling location, scale, and shape: Using GAMLSS in R, Chapman and Hall/CRC. An older version can be found in https://www.gamlss.com/.

Stasinopoulos D. M. Rigby R.A. (2007) Generalized additive models for location scale and shape (GAMLSS) in R. Journal of Statistical Software, Vol. 23, Issue 7, Dec 2007, https://www.jstatsoft.org/v23/i07/.

Stasinopoulos D. M., Rigby R.A., Heller G., Voudouris V., and De Bastiani F., (2017) Flexible Regression and Smoothing: Using GAMLSS in R, Chapman and Hall/CRC.

(see also https://www.gamlss.com/).

See Also

random

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# Simulate data 1
    n <- 10  # number of levels 
    m <- 200 # number of observations  
set.seed(2016)
level <-  as.factor(floor(runif(m) * n) + 1)
  a0  <-  rnorm(n)
sigma <-  0.4
   mu <-  a0[level]
   y <-  mu + sigma * rnorm(m)
plot(y~level)
points(1:10,a0, col="red")
 da1 <- data.frame(y, level)
#------------------
  mn <- gamlss(y~1,data=da1 ) # null model 
  ms <- gamlss(y~level-1, data=da1) # saturated model 
  m1 <- gamlss(y~pcat(level), data=da1) # calculating lambda ML
AIC(mn, ms, m1)
## Not run: 
m11 <- gamlss(y~pcat(level, method="GAIC", k=log(200)), data=da1) # GAIC
AIC(mn, ms, m1, m11) 
#gettng the fitted object -----------------------------------------------------
getSmo(m1)
coef(getSmo(m1))
fitted(getSmo(m1))[1:10]
plot(getSmo(m1)) # 
# After the fit a new factor is created  this factor has the reduced levels
 levels(getSmo(m1)$factor)
# -----------------------------------------------------------------------------

## End(Not run)

Example output

Loading required package: splines
Loading required package: gamlss.data
Loading required package: gamlss.dist
Loading required package: MASS
Loading required package: nlme
Loading required package: parallel
 **********   GAMLSS Version 5.0-2  ********** 
For more on GAMLSS look at http://www.gamlss.org/
Type gamlssNews() to see new features/changes/bug fixes.

GAMLSS-RS iteration 1: Global Deviance = 665.6181 
GAMLSS-RS iteration 2: Global Deviance = 665.6181 
GAMLSS-RS iteration 1: Global Deviance = 217.2095 
GAMLSS-RS iteration 2: Global Deviance = 217.2095 
GAMLSS-RS iteration 1: Global Deviance = 217.7802 
GAMLSS-RS iteration 2: Global Deviance = 217.7796 
          df      AIC
m1  9.453897 236.6874
ms 11.000000 239.2095
mn  2.000000 669.6181
GAMLSS-RS iteration 1: Global Deviance = 285.8586 
GAMLSS-RS iteration 2: Global Deviance = 233.2993 
GAMLSS-RS iteration 3: Global Deviance = 221.612 
GAMLSS-RS iteration 4: Global Deviance = 221.5368 
GAMLSS-RS iteration 5: Global Deviance = 221.5368 
           df      AIC
m1   9.453897 236.6874
m11  7.669404 236.8756
ms  11.000000 239.2095
mn   2.000000 669.6181
Randon effects fit using the gamlss function pcat() 
Degrees of Freedom for the fit : 8.453897 
Random effect parameter sigma_b: 5.2047 
Smoothing parameter lambda     : 0.200613 
 [1] -0.1809184 -1.9487830 -1.7143269  0.9427130 -0.1809184  0.3431122
 [7]  1.4974744  1.7843534 -0.4720086  0.4491312
 [1] -1.9487830 -1.9487830 -0.4720086 -1.9487830 -0.1809184 -1.9487830
 [7]  1.4974744 -0.4720086 -0.1809184 -0.1809184
[1] "-1.949" "-1.714" "-0.472" "-0.181" "0.343"  "0.449"  "0.943"  "1.497" 
[9] "1.784" 

gamlss documentation built on March 31, 2021, 5:10 p.m.