knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
This documents is an introduction to the package emGauss
. The aim of this package is to find the cutoff point (ECOFF) between resistant and non-resistant antibiotics.
Therefore, first load the package emGauss
.
library(emGauss)
The real data (Zonedata) from the EUCAST website is contained in package EUCASTData
. This will be loaded in the next lines of code
data("ZD", package = "EUCASTData")
In order to get an example dataset, we have to extract the nessercary columns from the ZD dataset. The plot shows that the data may have two components.
# 65, 855, 1091 example datasets zd <- as.numeric(ZD[1091, 4:48]) zd.data <- data.frame(bin = 6:50, nrObs = zd) barplot(zd.data$nrObs, names.arg = c(zd.data$bin))
In this section we use the function em.gauss.opti.groups
to find the optimum number of components based on the BIC.
k <- 5 y <- as.numeric(zd) result <-em.gauss.opti.groups(y = y, k = k, alpha = 3, beta= 1, method = "quantile", epsilon=0.0001)
As a result we get a list with all parameters of the fitted distribution and the AIC and BIC values. The next line of code shows the best number of components based on the BIC. As we can see is the optimum number of components 2.
plot(result$BIC, ylab = 'BIC', xlab = 'Number of components') abline(v=which.min(result$BIC), lty= 2, col=2)
In order to find start values for the mean and the variance we use the function createCluster
.
k <- 2 start.musigma2 <- createCluster(as.matrix(zd.data), k, method = 'quantile') start.musigma2
After finding the start values we can use the function em.gauss
to fit the mixing distribution of normals to the data. As a result we get the mean, variance, mixing proportions and loglikelihood of the fitted distribution. Furthermore the function provides an estimate of the ECOFF value.
y <- as.numeric(zd) em.result <- em.gauss(y = y, mu = start.musigma2$mu, sigma2 = start.musigma2$sigma2, pi = rep(1/k, k), alpha = 3, beta = 1, epsilon = 0.0001, max.iter = 650)
Now we use the function plot.fct to plot the fitted distribution and the ECOFF.
plot_fct(y, mu = em.result$mu, sigma2 = em.result$sigma2, pi = em.result$pi, ecoff = em.result$ecoff)
For the second dataset the steps to find the optimal ECOFF Value is similar to the first dataset.
Firstly the data gets plotted
zd <- as.numeric(ZD[65, 4:48]) barplot(zd, names.arg = 6:50)
Secondly, the function em.gauss.opti.groups is used to find the optimal number of components. Because it seems to be only one component, the maximum number of components is set to 3.
result <-em.gauss.opti.groups(y = zd, k = 3, alpha = 3, beta= 1, method = "quantile", epsilon=0.0001) plot(result$BIC, ylab = 'BIC', xlab = 'Number of components') abline(v=which.min(result$BIC), lty= 2, col=2)
In the last step the function plot_fct is used to plot the distribution, which has the minimum BIC. Moreover the ECOFF value is shown as the vertical line.
em.result <- result[[which.min(result$BIC)]] plot_fct(as.numeric(zd), mu = em.result$mu, sigma2 = em.result$sigma2, pi = em.result$pi, ecoff = em.result$ecoff)
The third dataset looks like the following figure
zd <- as.numeric(ZD[855, 4:48]) barplot(zd, names.arg = 6:50)
As in the examples before the function em.gauss.opti.groups is used to find the optimal number of components based on the BIC.
result <-em.gauss.opti.groups(y = zd, k = 5, alpha = 3, beta= 1, method = "quantile", epsilon=0.0001) plot(result$BIC, ylab = 'BIC', xlab = 'Number of components') abline(v=which.min(result$BIC), lty= 2, col=2)
In the last step the optimal distribution and its ECOFF value is visualized by the function plot_fct.
em.result <- result[[which.min(result$BIC)]] plot_fct(as.numeric(zd), mu = em.result$mu, sigma2 = em.result$sigma2, pi = em.result$pi, ecoff = em.result$ecoff)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.