library(qqtest)
Suppose $X \sim \chi^2_m$ is a Chi-squared random variate on $m$ degrees of freedom. Then the distribution of [Y = \sqrt{\frac{X}{m}}] is the Kay distribution on $m$ degrees of freedom, written as [Y \sim K_m.] Its density is [f(y) = \left{\begin{array}{lcl} \frac{m^{\frac{m}{2}} y ^{m-1} e^{-\frac{1}{2} m y^2}} {2^{\frac{m}{2}-1} \Gamma(\frac{m}{2})} &~~~& \text{for} ~~ 0 \le y < \infty \ &&\ 0 && \text{otherwise}. \end{array} \right. ]
The $K_m$ density has some very attractive features over the $\chi^2_m$ density:
# compare K density to that of chi as degrees of freedom increase op <-par(mfrow=c(1,2)) p <- seq(0.001, .999, 0.001) # # First get all the chi-square densities and plot them xchi5 <- qchisq(p,5) dchi5 <- dchisq(xchi5,5) xchi10 <- qchisq(p,10) dchi10 <- dchisq(xchi10,10) xchi20 <- qchisq(p,20) dchi20 <- dchisq(xchi20,20) xchi30 <- qchisq(p,30) dchi30 <- dchisq(xchi20,30) xlim <- range(xchi5, xchi10, xchi20, xchi30) ylim <- range(dchi5, dchi10, dchi20, dchi30) plot(xchi5, dchi5, type="l", xlab="x", ylab="density", xlim=xlim, ylim=ylim, main="chi-squared densities", col="steelblue") lines(xchi10, dchi10, lty=2, col="steelblue") lines(xchi20, dchi20, lty=3, col="steelblue") lines(xchi20, dchi30, lty=4, col="steelblue") legend("topright", legend=c("df = 5", "df = 10", "df = 20", "df = 30"), lty=c(1,2,3,4), title="degrees of freedom", cex=0.75, bty="n", col="steelblue") # # Now get all the K densities and plot them xkay5 <- qkay(p,5) dkay5 <- dkay(xkay5,5) xkay10 <- qkay(p,10) dkay10 <- dkay(xkay10,10) xkay20 <- qkay(p,20) dkay20 <- dkay(xkay20,20) xkay30 <- qkay(p,30) dkay30 <- dkay(xkay20,30) xlim <- range(xkay5, xkay10, xkay20, xkay30) ylim <- range(dkay5, dkay10, dkay20, dkay30) plot(xkay5, dkay5, type="l", xlab="y", ylab="density", xlim=xlim, ylim=ylim, main="K densities", col="steelblue") lines(xkay10, dkay10, lty=2, col="steelblue") lines(xkay20, dkay20, lty=3, col="steelblue") lines(xkay20, dkay30, lty=4, col="steelblue") abline(v=1, col="grey", lty=5) legend("topright", legend=c("df = 5", "df = 10", "df = 20", "df = 30"), lty=c(1,2,3,4), title="degrees of freedom", cex=0.75, bty="n", col="steelblue") par(op)
These values were calculated using the dkay(...)
density function. For example, dkay(1.0, df=10) =
r dkay(1.0, df=10)
.
Perhaps the most obvious relation between a normal random variate and a $K_m$ is that if $Z \sim N(0,1)$, then $|Z|\sim K_1$, the half-normal.
More important in applications is that distribution of the estimator of the sample standard deviation is proportional to a $K_m$. To be precise, if $Y_1, \ldots, Y_n$ are independent and identically distributed as $N(\mu, \sigma^2)$ random variates, with realizations $y_1, \ldots, y_n$ and the usual estimates $\widehat{\mu} = \sum y_i /n$ and $\widehat{\sigma} = \sqrt{\sum (y_i - \widehat{\mu})^2/(n-1)}$, then the corresponding estimators $\widetilde{\mu}$ and $\widetilde{\sigma}$ are distributed as [ \widetilde{\mu} \sim N(\mu, \frac{\sigma^2}{n}) ~~~~~\text{and} ~~~~~ \frac{\widetilde{\sigma}}{\sigma} \sim K_{n-1}. ] The latter shows that $K_m$ is used for inference (e.g. tests and confidence intervals) about $\sigma$.
This is handy because the $K_m$ quantiles vary much less than do those of $\chi^2_m$. For example, condider the following table of the cumulative distribution.
df <- c(1:10,seq(15, 40, 5)) p <- c( 0.05, 0.5, 0.95) fun <- function(p) qkay(p,df) table <- as.data.frame(cbind(df,sapply(p, fun))) colnames(table) <- c("df", paste0("p=", p)) knitr::kable(table)
Unlike the $\chi^2_m$ distribution, the quantiles in this table stabilize, allowing $1 \pm 0.20$ being not a bad rule of thumb for a $90\%$ probability of the ratio $\widetilde{\sigma}/\sigma$.
These values were calculated using the qkay(...)
quantile function. For example, qkay(0.05, df=5) =
r qkay(0.05, df=5)
. These would be used to construct interval estimates for $\sigma$.
To get observed significance levels, the cumulative distribution function pkay(...)
would be used. For example, SL = 1- pkay(1.4, df=10) = 1 -
r pkay(1.4, df=10)
=
r 1- pkay(1.4, df=10)
.
For the standard normal theory, the Student $t_m$ distribution can be defined as follows. If $Z \sim N(0,1)$ and $Y \sim K_m$ is distributed independently of $Z$, then the ratio [T=\frac{Z}{Y} = \frac{N(0,1)}{K_m} = t_m] which is fairly easy to remember.
For the estimators from the above model [\frac{\widetilde{\mu} - \mu} {\widetilde{\sigma} / \sqrt{n}} = \frac{ \frac{\widetilde{\mu}-\mu} {\sigma/\sqrt{n}} } {\frac{\widetilde{\sigma}} {\sigma} } = \frac{N(0,1)}{K_{n-1}} = t_{n-1} ] is used to construct interval estimates and tests for the value of the parameter $\mu$.
As with every other distribution in R
four functions are provided for the $K_m$ distribution. These are
dkay(x, df=m, ...)
which evalutes the density of $K_m$ at $x$,pkay(x, df=m, ...)
which evalutes the distribution of $K_m$ at $x$,qkay(p, df=m, ...)
which evalutes the quantile of $K_m$ at the proportion $p$,rkay(n, df=m, ...)
which generates $n$ pseudo-random realizations from $K_m$.The parameters in the ellipsis include a non-centrality parameter. All functions rely on the corresponding $\chi^2_m$ functions in base R
.
We briefly illustrate each below.
dkay(x, df, ...)
x <- seq(0,2,0.01) plot(x, dkay(x, df=10), type="l", col="steelblue", main="Density", xlab="x", ylab="f(x)") abline(v=1.0, lty=2, col="grey")
pkay(x, df, ...)
x <- seq(0,2,0.01) plot(x, pkay(x, df=10), type="l", col="steelblue", main="Distribution", xlab="x", ylab="F(x)") abline(v=1.0, lty=2, col="grey")
qkay(p, df, ...)
x <- seq(0,2,0.01) p <- pkay(x, df=10) plot(p, qkay(p, df=10), type="l", col="steelblue", main="Quantile function", xlab="p", ylab="Q(p)") abline(h=1.0, lty=2, col="grey")
rkay(n, df, ...)
x <- rkay(1000, df=10) hist(x, col="steelblue", main="Pseudo-random numbers", xlab="x") abline(v=1.0, lty=2, col="grey")
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.