yjt_dist: Scaled t Distribution with Yeo-Johnson and Box-Cox...
In mdmb: Model Based Treatment of Missing Data

yjt_dist

R Documentation

Scaled `t` Distribution with Yeo-Johnson and Box-Cox Transformations

Description

Collection of functions for the Yeo-Johnson transformation (Yeo & Johnson, 2000) and the corresponding distribution family of scaled t distribution with and without Yeo-Johnson transformation (see Details). The Yeo-Johnson transformation can also be applied for bounded variables on (0,1) which uses a probit transformation (see Details; argument probit).

The Box-Cox transformation (bc; Sakia, 1992) can be applied for variables with positive values.

Usage

# Yeo-Johnson transformation and its inverse transformation
yj_trafo(y, lambda, use_rcpp=TRUE, probit=FALSE)
yj_antitrafo(y, lambda, probit=FALSE)

#---- scaled t distribution with Yeo-Johnson transformation
dyjt_scaled(x, location=0, shape=1, lambda=1, df=Inf, log=FALSE, probit=FALSE)
ryjt_scaled(n, location=0, shape=1, lambda=1, df=Inf, probit=FALSE)

fit_yjt_scaled(x, df=Inf, par_init=NULL, lambda_fixed=NULL, weights=NULL, probit=FALSE)
## S3 method for class 'fit_yjt_scaled'
coef(object, ...)
## S3 method for class 'fit_yjt_scaled'
logLik(object, ...)
## S3 method for class 'fit_yjt_scaled'
summary(object, digits=4, file=NULL, ...)
## S3 method for class 'fit_yjt_scaled'
vcov(object, ...)

# Box-Cox transformation and its inverse transformation
bc_trafo(y, lambda)
bc_antitrafo(y, lambda)

#---- scaled t distribution with Box-Cox transformation
dbct_scaled(x, location=0, shape=1, lambda=1, df=Inf, log=FALSE, check_zero=TRUE)
rbct_scaled(n, location=0, shape=1, lambda=1, df=Inf)

fit_bct_scaled(x, df=Inf, par_init=NULL, lambda_fixed=NULL, weights=NULL)
## S3 method for class 'fit_bct_scaled'
coef(object, ...)
## S3 method for class 'fit_bct_scaled'
logLik(object, ...)
## S3 method for class 'fit_bct_scaled'
summary(object, digits=4, file=NULL, ...)
## S3 method for class 'fit_bct_scaled'
vcov(object, ...)

#---- scaled t distribution
dt_scaled(x, location=0, shape=1, df=Inf, log=FALSE)
rt_scaled(n, location=0, shape=1, df=Inf)

fit_t_scaled(x, df=Inf, par_init=NULL, weights=NULL)
## S3 method for class 'fit_t_scaled'
coef(object, ...)
## S3 method for class 'fit_t_scaled'
logLik(object, ...)
## S3 method for class 'fit_t_scaled'
summary(object, digits=4, file=NULL, ...)
## S3 method for class 'fit_t_scaled'
vcov(object, ...)

Arguments

`y`	Numeric vector
`lambda`	Transformation parameter `\lambda` for Yeo-Johnson transformation
`use_rcpp`	Logical indicating whether Rcpp package should be used
`probit`	Logical indicating whether probit transformation should be applied for bounded variables on `(0,1)`
`x`	Numeric vector
`location`	Location parameter of (transformed) scaled `t` distribution
`shape`	Shape parameter of (transformed) scaled `t` distribution
`df`	Degrees of freedom of (transformed) scaled `t` distribution
`log`	Logical indicating whether logarithm of the density should be computed
`check_zero`	Logical indicating whether check for inadmissible values should be conducted
`n`	Number of observations to be simulated
`par_init`	Optional vector of initial parameters
`lambda_fixed`	Optional value for fixed `\lambda` parameter
`weights`	Optional vector of sampling weights
`object`	Object of class `fit_yjt_scaled` or `fit_t_scaled`
`digits`	Number of digits used for rounding in `summary`
`file`	File name for the `summary` to be sunk into
`...`	Further arguments to be passed

Details

Let g_\lambda be the Yeo-Johnson transformation. A random variable X is distribution as Scaled t with Yeo-Johnson transformation with location \mu, scale \sigma and transformation parameter \lambda iff X=g_\lambda ( \mu + \sigma Z ) and Z is t distributed with df degrees of freedom.

For a bounded variable X on (0,1), the probit transformation \Phi is applied such that X=\Phi( g_\lambda ( \mu + \sigma Z ) ) with a t distributed variable Z.

For a Yeo-Johnson normally distributed variable, a normally distributed variable results in case of \lambda=1. For a Box-Cox normally distributed variable, a normally distributed variable results for \lambda=1.

Value

Vector or an object of fitted distribution depending on the called function

References

Sakia, S. M. (1992). The Box-Cox transformation technique: A review. The Statistician, 41(2), 169-178. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.2307/2348250")}

Yeo, I.-K., & Johnson, R. (2000). A new family of power transformations to improve normality or symmetry. Biometrika, 87(4), 954-959. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1093/biomet/87.4.954")}

Examples

#############################################################################
# EXAMPLE 1: Transforming values according to Yeo-Johnson transformation
#############################################################################

# vector of y values
y <- seq(-3,3, len=100)

# non-negative lambda values
plot( y, mdmb::yj_trafo( y, lambda=1 ), type="l", ylim=8*c(-1,1),
           ylab=expression( g[lambda] (y) ) )
lines( y, mdmb::yj_trafo( y, lambda=2 ), lty=2 )
lines( y, mdmb::yj_trafo( y, lambda=.5 ), lty=3 )
lines( y, mdmb::yj_trafo( y, lambda=0 ), lty=4 )

# non-positive lambda values
plot( y, mdmb::yj_trafo( y, lambda=-1 ), type="l", ylim=8*c(-1,1),
           ylab=expression(g[lambda] (y) ) )
lines( y, mdmb::yj_trafo( y, lambda=-2 ), lty=2 )
lines( y, mdmb::yj_trafo( y, lambda=-.5 ), lty=3 )
lines( y, mdmb::yj_trafo( y, lambda=0 ), lty=4 )

## Not run: 
#############################################################################
# EXAMPLE 2: Density of scaled t distribution
#############################################################################

# define location and scale parameter
m0 <- 0.3
sig <- 1.5
#-- compare density of scaled t distribution with large degrees of freedom
#   with normal distribution
y1 <- mdmb::dt_scaled( y, location=m0, shape=sig, df=100 )
y2 <- stats::dnorm( y, mean=m0, sd=sig )
max(abs(y1-y2))

#############################################################################
# EXAMPLE 3: Simulating and fitting the scaled t distribution
#############################################################################

#-- simulate data with 10 degrees of freedom
set.seed(987)
df0 <- 10    # define degrees of freedom
x <- mdmb::rt_scaled( n=1E4, location=m0, shape=sig, df=df0 )
#** fit data with df=10 degrees of freedom
fit1 <- mdmb::fit_t_scaled(x=x, df=df0 )
#** compare with fit from normal distribution
fit2 <- mdmb::fit_t_scaled(x=x, df=Inf )  # df=Inf is the default

#-- some comparisons
coef(fit1)
summary(fit1)
logLik(fit1)
AIC(fit1)
AIC(fit2)

#############################################################################
# EXAMPLE 4: Simulation and fitting of scaled t distribution with
#            Yeo-Johnson transformation
#############################################################################

# define parameters of transformed scaled t distribution
m0 <- .5
sig <- 1.5
lam <- .5

# evaluate density
x <- seq( -5, 5, len=100 )
y <- mdmb::dyjt_scaled( x, location=m0, shape=sig, lambda=lam )
graphics::plot( x, y, type="l")

# transform original values
mdmb::yj_trafo( y=x, lambda=lam )

#** simulate data
set.seed(987)
x <- mdmb::ryjt_scaled(n=3000, location=m0, shape=sig, lambda=lam )
graphics::hist(x, breaks=30)

#*** Model 1: Fit data with lambda to be estimated
fit1 <- mdmb::fit_yjt_scaled(x=x)
summary(fit1)
coef(fit1)

#*** Model 2: Fit data with lambda fixed to simulated lambda
fit2 <- mdmb::fit_yjt_scaled(x=x, lambda_fixed=lam)
summary(fit2)
coef(fit2)

#*** Model 3: Fit data with lambda fixed to 1
fit3 <- mdmb::fit_yjt_scaled(x=x, lambda_fixed=1)

#-- compare log-likelihood values
logLik(fit1)
logLik(fit2)
logLik(fit3)

#############################################################################
# EXAMPLE 5: Approximating the chi square distribution
#            with yjt and bct distribution
#############################################################################

#-- simulate data
set.seed(987)
n <- 3000
df0 <- 5
x <- stats::rchisq( n=n, df=df0 )

#-- plot data
graphics::hist(x, breaks=30)

#-- fit data with yjt distribution
fit1 <- mdmb::fit_yjt_scaled(x)
summary(fit1)
c1 <- coef(fit1)

#-- fit data with bct distribution
fit2 <- mdmb::fit_bct_scaled(x)
summary(fit2)
c2 <- coef(fit2)
# compare log-likelihood values
logLik(fit1)
logLik(fit2)

#-- plot chi square distribution and approximating yjt distribution
y <- seq( .01, 3*df0, len=100 )
dy <- stats::dchisq( y, df=df0 )
graphics::plot( y, dy, type="l", ylim=c(0, max(dy) )*1.1 )
# approximation with scaled t distribution and Yeo-Johnson transformation
graphics::lines( y, mdmb::dyjt_scaled(y, location=c1[1], shape=c1[2], lambda=c1[3]),
                     lty=2)
# approximation with scaled t distribution and Box-Cox transformation
graphocs::lines( y, mdmb::dbct_scaled(y, location=c2[1], shape=c2[2], lambda=c2[3]),
                     lty=3)
# appoximating normal distribution
graphics::lines( y, stats::dnorm( y, mean=df0, sd=sqrt(2*df0) ), lty=4)
graphics::legend( .6*max(y), .9*max(dy), c("chi square", "yjt", "bct", "norm"),
                     lty=1:4)

#############################################################################
# EXAMPLE 6: Bounded variable on (0,1) with Probit Yeo-Johnson transformation
#############################################################################

set.seed(876)
n <- 1000
x <- stats::rnorm(n)
y <- stats::pnorm( 1*x + stats::rnorm(n, sd=sqrt(.5) ) )
dat <- data.frame( y=y, x=x )

#*** fit Probit Yeo-Johnson distribution
mod1 <- mdmb::fit_yjt_scaled(x=y, probit=TRUE)
summary(mod1)

#*** estimation using regression model
mod2 <- mdmb::yjt_regression( y ~ x, data=dat, probit=TRUE )
summary(mod2)

## End(Not run)

mdmb documentation built on Sept. 11, 2024, 5:23 p.m.