Detecting Skin Diseases


The UCI data set \texttt{dermatitis} [@altay] consist of

and a class variable with six different skin diseases

Many of the classical machine learning algorithms have been applied to the dataset \texttt{dermatitis} [@liu2015fast]. They all achieve a prediction accuracy above $95\%$ and some even above $99\%$. But...:


Given a new patient $y$, we want to test the hypotheses

\begin{align} H_1: & y \text{ has psoriasis} \ H_2: & y \text{ has seborrheic dermatitis} \ H_3: & y \text{ has lichen planus} \ H_4: & y \text{ has pityriasis rosea} \ H_5: & y \text{ has chronic dermatitis} \ H_6: & y \text{ has pityriasis rubra pilaris} \end{align}

Since all hypotheses are exclusive we do not correct for multiple hypothesis testing (but the user can do this by setting the significance level accordingly).

Modeling Attributes of Psoriasis

We first show how to test $H_1$. First extract the psoriasis data:

y     <- unlist(derma[80, -35]) # a patient with seboreic dermatitis
psor  <- derma %>%
  filter(ES == "psoriasis") %>%

Next, we fit the interaction graph for the psoriasis patients:

g <- fit_graph(psor, q = 0, trace = FALSE)

We can color the nodes corresponding to clinical attributes (red), histopathological attributes (green) and the age variable (gray):

vs   <- names(adj_lst(g))
vcol <- structure(vector("character", length(vs)), names = vs)
vcol[grepl("c", vs)] <- "tomato"  # clinical attributes
vcol[grepl("h", vs)] <- "#98FB98" # histopathological attributes
vcol["age"]          <- "gray"    # age variable
plot(g, vcol, vertex.size = 10, vertex.label = NA)

The take home message here is, that we cannot assume independence between the attributes for the psoriasis patient as seen in the interaction graph - there are many associations.

Outlier Model for Psoriasis Patients

m <- fit_outlier(psor, g, y)

Notice that that the number of observations is $112$ even though we have only observed $111$ psoriasis patients. This is because, under the hypothesis, $H_1$, the new observation $y$ has psoriasis. The other summary statistics is self explanatory.

Plotting the Approximated Density of the Test Statistic


The red area is the critical region (here 5%) and the dotted line is the observed test statistic (the deviance) of $y$. Since the dotted line is outside the critical region, we cannot reject that $y$ has psoriasis.

Testing all Hypothesis Simultaneously

We can use the fit_multiple_models function to test all six hypothesis as follows.

mm <- fit_multiple_models(derma, y, "ES", q = 0,trace = FALSE) 

Thus, we cannot reject that $y$ has either psoriasis, seboreic dermatitis or pityriasis rosea. This is conservative compared to classification methods and hence a little safer. The medical expert should proceed the investigation from here.


Try the molic package in your browser

Any scripts or data that you put into this service are public.

molic documentation built on June 2, 2021, 5:07 p.m.