knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

This documents is an introduction to the package emGauss. The aim of this package is to find the cutoff point (ECOFF) between resistant and non-resistant antibiotics.

Setup

Therefore, first load the package emGauss.

library(emGauss)

Data

The real data (Zonedata) from the EUCAST website is contained in package EUCASTData. This will be loaded in the next lines of code

data("ZD", package = "EUCASTData")

Example Dataset 1

In order to get an example dataset, we have to extract the nessercary columns from the ZD dataset. The plot shows that the data may have two components.

# 65, 855, 1091 example datasets 
zd <- as.numeric(ZD[1091, 4:48])
zd.data <- data.frame(bin = 6:50, nrObs = zd)
barplot(zd.data$nrObs, names.arg = c(zd.data$bin))

Find optimum number of components

In this section we use the function em.gauss.opti.groups to find the optimum number of components based on the BIC.

k <- 5
y <- as.numeric(zd)

result <-em.gauss.opti.groups(y = y,
                     k = k,
                     alpha = 3,
                     beta= 1,
                     method = "quantile",
                     epsilon=0.0001)

As a result we get a list with all parameters of the fitted distribution and the AIC and BIC values. The next line of code shows the best number of components based on the BIC. As we can see is the optimum number of components 2.

plot(result$BIC, ylab = 'BIC',
     xlab = 'Number of components')
abline(v=which.min(result$BIC), lty= 2, col=2)

Find start values

In order to find start values for the mean and the variance we use the function createCluster.

k <- 2
start.musigma2 <- createCluster(as.matrix(zd.data), k, 
                                method = 'quantile')
start.musigma2

Fitting distribution and find ECOFF

After finding the start values we can use the function em.gauss to fit the mixing distribution of normals to the data. As a result we get the mean, variance, mixing proportions and loglikelihood of the fitted distribution. Furthermore the function provides an estimate of the ECOFF value.

y <- as.numeric(zd)

em.result <- em.gauss(y = y,
                      mu = start.musigma2$mu,
                      sigma2 = start.musigma2$sigma2,
                      pi = rep(1/k, k),
                      alpha = 3,
                      beta = 1,
                      epsilon = 0.0001,
                      max.iter = 650)

Plot result

Now we use the function plot.fct to plot the fitted distribution and the ECOFF.

plot_fct(y, 
          mu = em.result$mu, 
          sigma2 = em.result$sigma2, 
          pi = em.result$pi, 
          ecoff = em.result$ecoff)

Example Dataset 2

For the second dataset the steps to find the optimal ECOFF Value is similar to the first dataset.

Firstly the data gets plotted

zd <- as.numeric(ZD[65, 4:48])
barplot(zd, names.arg = 6:50)

Find optimum number of components

Secondly, the function em.gauss.opti.groups is used to find the optimal number of components. Because it seems to be only one component, the maximum number of components is set to 3.

result <-em.gauss.opti.groups(y = zd,
                     k = 3,
                     alpha = 3,
                     beta= 1,
                     method = "quantile",
                     epsilon=0.0001)
plot(result$BIC, ylab = 'BIC',
     xlab = 'Number of components')
abline(v=which.min(result$BIC), lty= 2, col=2)

Plot result

In the last step the function plot_fct is used to plot the distribution, which has the minimum BIC. Moreover the ECOFF value is shown as the vertical line.

em.result <- result[[which.min(result$BIC)]]              
plot_fct(as.numeric(zd), 
         mu = em.result$mu, 
         sigma2 = em.result$sigma2, 
         pi = em.result$pi, 
         ecoff = em.result$ecoff)

Example Dataset 3

The third dataset looks like the following figure

zd <- as.numeric(ZD[855, 4:48])
barplot(zd, names.arg = 6:50)

Find optimum number of components

As in the examples before the function em.gauss.opti.groups is used to find the optimal number of components based on the BIC.

result <-em.gauss.opti.groups(y = zd,
                     k = 5,
                     alpha = 3,
                     beta= 1,
                     method = "quantile",
                     epsilon=0.0001)
plot(result$BIC, ylab = 'BIC',
     xlab = 'Number of components')
abline(v=which.min(result$BIC), lty= 2, col=2)

Plot result

In the last step the optimal distribution and its ECOFF value is visualized by the function plot_fct.

em.result <- result[[which.min(result$BIC)]]              
plot_fct(as.numeric(zd), 
         mu = em.result$mu, 
         sigma2 = em.result$sigma2, 
         pi = em.result$pi, 
         ecoff = em.result$ecoff)


sp2019-antibiotics/emGauss documentation built on Nov. 5, 2019, 9:14 a.m.