drug.consumption: Drug Consumption
In fairml: Fair Models in Machine Learning

drug.consumption

R Documentation

Drug Consumption

Description

Predict drug consumption based on psychological scores and demographics.

Usage

data(drug.consumption)

Format

The data contains 1885 observations and 31 variables. See the UCI Machine Learning Repository for details.

Note

The data set has been minimally pre-processed following the instructions on the UCI Machine Learning Repository to re-encode the variables. Categorical variables are stored as factors and the psychological scores are stored as numeric variables on their original scales.

Any of the drug use variables can be used as the response variable (with 7 different levels); Age, Gender and Race are the sensitive attributes. The remaining variables are used as predictors.

The data contain the following variables:

Age, a factor with 6 10-years age brackets;
Gender, as a factor;
Education, a factor with 9 levels from "Left school before 16" to "Doctorate degree";
Country, a factor with 7 different levels for "USA", "New Zealand", "Other", "Australia", "Republic of Ireland" "Canada" and "UK";
Race a factor with 7 levels comprising mixed backgrounds as well;
Nscore, Escore, Oscore, Ascore, Cscore, numeric scores from the five-factor model for personality traits;
Impulsive, a numeric score for impulsivity;
SS, a numeric score for sensation seeking;
Alcohol, Amphet, Amyl, Benzos, Caff, Cannabis, Choc, Coke, Crack, Ecstasy, Heroin, Ketamine, Legalh, LSD, Meth, Mushrooms, Nicotine, Semer and VSA: factors with 7 levels ranging from "Never Used" to "Used in Last Day".

References

UCI Machine Learning Repository.
https://archive-beta.ics.uci.edu/dataset/373/

Examples

data(drug.consumption)

# short-hand variable names.
r = drug.consumption[, "Meth"]
s = drug.consumption[, c("Age", "Gender", "Race")]
p = drug.consumption[, c("Education", "Nscore", "Escore", "Oscore", "Ascore",
                         "Cscore", "Impulsive", "SS")]

# collapse levels with low observed frequencies.
levels(p$Education) =
  c("at.most.18y", "at.most.18y", "at.most.18y", "at.most.18y", "university",
    "diploma", "bachelor", "master", "phd")

## Not run: 
m = fgrrm(response = r, sensitive = s, predictors = p, ,
      family = "multinomial", unfairness = 0.05)
summary(m)

HH = drug.consumption$Heroin
levels(HH) = c("Never Used", "Used", "Used", "Used", "Used Recently",
               "Used Recently", "Used Recently")

m = fgrrm(response = HH, sensitive = s, predictors = p, ,
      family = "multinomial", unfairness = 0.05)
summary(m)

## End(Not run)

fairml documentation built on June 8, 2025, 11:38 a.m.