tobacco: Tobacco Use and Health - Simulated Dataset

tobaccoR Documentation

Tobacco Use and Health - Simulated Dataset


A simulated datasets of 1,000 subjects, with the following variables:




A data frame with 1000 rows and 9 variables


  • gender Factor with 2 levels: “F” and “M”, having roughly 500 of each.

  • age Numerical.

  • Factor with 4 age categories.

  • BMI Body Mass Index (numerical).

  • smoker Factor (“Yes” / “No”).

  • Number of cigarettes smoked per day (numerical).

  • diseased Factor (“Yes” / “No”).

  • disease Character.

  • samp.wgts Sampling weights (numerical).

A note on simulation: probability for an individual to fall into category “diseased” is based on an arbitrary function involving age, BMI and number of cigarettes per day.

A copy of this dataset is also available in French under the name “tabagisme”.

summarytools documentation built on May 20, 2022, 9:06 a.m.