knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.pos = 'H'
)
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(magrittr))
suppressPackageStartupMessages(library(data.table))
suppressPackageStartupMessages(library(testthat))
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(hseclean))

Introduction

Alcohol consumption data in the Health Survey for England (HSE) is recorded in four main forms:

Both adults and children have data on whether they drink alcohol or not, and on the frequency of drinking. The main difference between the recording of data for adults and children is that adults have a lot of data on how much and what they drink, but children only have data on the amount drunk in the last week.

The recording of data varies among years of the HSE. We consider years from 2001 onwards. The main features of these changes in recording are:

Due to the variability in recording, we only consider data on the amount drunk by adults and children from 2011 onwards.

We analyse beverage-specific alcohol consumption in terms of beer (combining normal beer, strong beer), wine (combining wine and sherry), spirits, and alcopops.

Reading the HSE data files

There are separate functions in the hseclean package to read each year of HSE data e.g. read_2016(). These functions link to where the data is stored in the project folder PR_Consumption_TA. They read in all variables related to alcohol and selected socioeconomic and other descriptor variables.

# First test that each year of data can be read successfully

# If on uni system set the root directory as
root_dir <- "X:/"

# Each function has the file path to each year of data added to it as a default

test_2001 <- read_2001(root = root_dir)
test_2002 <- read_2002(root = root_dir)
test_2003 <- read_2003(root = root_dir)
test_2004 <- read_2004(root = root_dir)
test_2005 <- read_2005(root = root_dir)
test_2006 <- read_2006(root = root_dir)
test_2007 <- read_2007(root = root_dir)
test_2008 <- read_2008(root = root_dir)
test_2009 <- read_2009(root = root_dir)
test_2010 <- read_2010(root = root_dir)
test_2011 <- read_2011(root = root_dir)
test_2012 <- read_2012(root = root_dir)
test_2013 <- read_2013(root = root_dir)
test_2014 <- read_2014(root = root_dir)
test_2015 <- read_2015(root = root_dir)
test_2016 <- read_2016(root = root_dir)
test_2017 <- read_2017(root = root_dir)

Processing socioeconomic variables

There are separate functions to process each socioeconomic variables - detailed descriptions of what these functions do are given in vignette("covariate_data").

# Test each cleaning function on one year of data

temp <- read_2017(root = root_dir) %>%
  clean_age %>%
  clean_demographic %>% 
  clean_education %>%
  clean_economic_status %>%
  clean_family %>%
  clean_income %>%
  clean_health_and_bio

Whether someone drinks and frequency of drinking

Calculated for adults (aged 16 years or older) and children (aged 8 to 15 years) by the function alc_drink_now_allages(). We combine the information on drinking frequency from adults and children into a single variable.

We calculate the variable drinks_now, which classes someone as either a drinker or a non-drinker. Adults are classed as drinkers if they reported drinking at all in the last 12 months, even if reporting only having 1-2 drinks a year (according to the variable dnoft). Note that this definition of a non-drinker can vary among surveys, e.g. some surveys class only having 1-2 drinks a year as a non-drinker, and this could lead to variation in estimates of the number of non-drinkers.

We calculate the variable drink_freq_7d, which is a numerical variable that described drinking frequency. Adult drinking frequency is also inferred from the variable dnoft: the function alc_drink_freq() converts the categorical responses into the expected number of days in a week that someone drinks.

Missing data on whether or not someone currently drinks (drinks_now) is supplemented by responses to if currently drinks or if always non-drinker (the variables dnnow, dnany and dnevr).

For children (aged 8-15 years) we infer whether someone drinks or not (drinks_now) from the variable adrinkof. Someone is a non-drinker if they responded never to adrinkof. The categorical responses are converted into the expected number of days in a week that someone drinks as follows

Missing data on whether or not a child currently drinks (drinks_now) is supplemented by responses to when they last had an alcoholic drink (adrlast): if the last drink was less than six months ago, then we classify them as a drinker; if the last drink was six months or more ago, then we classify them as a non-drinker.

# Number of sampled drinkers and non-drinkers in 2017
read_2017(root = "X:/") %>%
  clean_age %>%
  clean_demographic %>%
  alc_drink_now_allages %>%
  filter(age < 90, age >= 8) %>%
  group_by(imd_quintile) %>% 
  count(drinks_now) %>% 
  ggplot(aes(x = drinks_now, y = n, fill = imd_quintile)) +
  geom_bar(stat = "identity", position = "dodge") +
  theme_minimal() +
  ylab("number of observations")
# Frequency of drinking in 2017 among drinkers
read_2017(root = "X:/") %>%
  clean_age %>%
  clean_demographic %>%
  alc_drink_now_allages %>%
  filter(age < 90, age >= 8, drinks_now == "drinker") %>%
  group_by(imd_quintile, age_cat) %>% 
  summarise(av_freq = mean(drink_freq_7d, na.rm = T)) %>% 
  ggplot(aes(x = imd_quintile, y = av_freq, fill = age_cat)) +
  geom_bar(stat = "identity", position = "dodge") +
  theme_minimal() +
  ylab("average number of days drink in a week")

Average amount of alcohol consumed

Assumptions about serving size and alcohol content

Some standard assumptions are made about the volume and alcohol content of the beverages that are reported to be drunk. The values that we use for these assumptions are based on those used by Natcen to create the derived variables for units of alcohol consumed in the HSE. We have made our own adjustments to the values used based on further information from market research data and figures from academic publications.

Alcohol content assumptions are the expected percentages of alcohol that each beverage contains (alcohol by volume, ABV). We use separate values for normal beer (4.4\%), strong beer (8.4\%), spirits (38\%), sherry (17\%), wine (12.5\%), and alcopops (also known as "ready to drink" or RTD) (4.5\%).

Beverage volume assumptions are the expected volumes (ml) of different beverage containers / serving sizes. We use separate values for normal and strong beer (half pint 284ml, small can 330ml, large can 440ml, bottle 330ml), spirits (serving 25ml), sherry (serving 50ml), wine (small glass 125ml, standard glass 175ml, large glass 250ml, bottle 750ml), and alcopops (small can 250ml, small bottle 275ml, large bottle 700ml).

# These data are stored within the hseclean package for easy use
# they can be accessed by typing 

hseclean::abv_data

hseclean::alc_volume_data

Adult average weekly consumption in the last 12 months

We estimate the average amount drunk in a week (weekmean) in terms of UK standard units of alcohol (1 unit = 10ml or 8g pure ethanol). The average amount drunk is then categorised as follows:

Separate variables are produced describing the average weekly units in four beverage categories: beer_units (including cider), wine_units (including sherry), spirit_units, rtd_units (this is alcopops). Further variables on beverage preference are produced that:

The processing is done by the function alc_weekmean_adult(). The calculation has the following steps:

# Average weekly units drunk in 2017
read_2017(root = "X:/") %>%
  clean_age %>%
  clean_demographic %>%
  alc_drink_now_allages %>%
  alc_weekmean_adult %>%
  filter(age < 90, age >= 16) %>%
  group_by(imd_quintile, age_cat) %>% 
  summarise(av_amount = mean(weekmean, na.rm = T)) %>% 
  ggplot(aes(x = imd_quintile, y = av_amount, fill = age_cat)) +
  geom_bar(stat = "identity", position = "dodge") +
  theme_minimal() +
  ylab("average number of units drunk in a week")
# Number of sampled people in each drinker category in 2017
read_2017(root = "X:/") %>%
  clean_age %>%
  clean_demographic %>%
  alc_drink_now_allages %>%
  alc_weekmean_adult %>%
  filter(age < 90, age >= 16) %>%
  group_by(imd_quintile) %>% 
  count(drinker_cat) %>% 
  mutate(drinker_cat = factor(drinker_cat, 
    levels = c("abstainer", "lower_risk", "increasing_risk", "higher_risk"))) %>%
  ggplot(aes(x = drinker_cat, y = n, fill = imd_quintile)) +
  geom_bar(stat = "identity", position = "dodge") +
  theme_minimal() +
  ylab("number of observations")

Adult consumption on the heaviest drinking day in the last week

The function alc_sevenday_adult() processes the information from the questions on adult (16 or more years old) drinking in the last seven days:

We estimate the number of UK standard units of alcohol drunk on the heaviest drinking day (peakday) by using the data on how many of what size measures of different beverages were drunk, and combining this with our standard assumptions about beverage volume and alcohol content. We further estimate their total units drunk of each beverage type on the heaviest drinking day (d7nbeer_units, d7sbeer_units, d7spirits_units, d7sherry_units, d7wine_units, d7pops_units).

Binge drinking status is then categorised into the variable binge_cat, with levels did_not_drink, binge and no_binge, where a binge day in defined by males drinking over 8 units and females drinking over 6 units.

Note that in 2007 new questions were added asking which glass size was used when wine was consumed. Therefore the post HSE 2007 unit calculations are not directly comparable to previous years’ data.

Missing data is imputed using the means of people who did drink in the last seven days, stratified by year, sex, IMD quintile and age category (0-1, 2-4, 5-7, 8-10, 11-12, 13-15, 16-17, 18-19, 20-24, 25-29, 30-34, 35-39, 40-44, 45-49, 50-54, 55-59, 60-64, 65-69, 70-74, 75-79, 80-84, 85-89, 90+).

# drinking in last 7 days in 2017
read_2017(root = "X:/") %>%
  clean_age %>%
  clean_demographic %>%
  alc_drink_now_allages %>%
  alc_weekmean_adult %>%
  alc_sevenday_adult %>%
  filter(age < 90, age >= 16) %>%
  group_by(imd_quintile, age_cat, sex) %>% 
  summarise(n_days7 = mean(n_days_drink, na.rm = T), 
            amount7 = mean(peakday, na.rm = T)) %>% 
  ggplot(aes(x = n_days7, y = amount7, colour = age_cat, shape = sex)) +
  geom_point(size = 3, alpha = .5) +
  facet_wrap(~ imd_quintile, nrow = 1) +
  theme_minimal() +
  ylab("average amount drunk on heaviest drinking day") +
  xlab("average number of days drunk on in last 7")
# Number of sampled people in each binge drinker category in 2017
read_2017(root = "X:/") %>%
  clean_age %>%
  clean_demographic %>%
  alc_drink_now_allages %>%
  alc_weekmean_adult %>%
  alc_sevenday_adult %>%
  filter(age < 90, age >= 16) %>%
  group_by(imd_quintile) %>% 
  count(binge_cat) %>% 
  mutate(binge_cat = factor(binge_cat, 
    levels = c("did_not_drink", "no_binge", "binge"))) %>%
  ggplot(aes(x = binge_cat, y = n, fill = imd_quintile)) +
  geom_bar(stat = "identity", position = "dodge") +
  theme_minimal() +
  ylab("number of observations")

Children's consumption in the last week

The function alc_sevenday_child() processes the information on drinking by children (ages 13-15) in the last seven days. The data on children's drinking comes in the form of survey questions on whether or not they have drunk each beverage type in the last week, and if so, how much of each was drunk. The main output is the variable total_units7_ch - the total units drunk in the last seven days.

We estimate the number of UK standard units of alcohol drunk in the last 7 days by using the data on how many of what size measures of different beverages were drunk, and combining this with our standard assumptions about beverage volume and alcohol content.

The information from this question is also used to update the drinks_now variable to make it a variable that describes whether or not adults and children drink.

Due to high missingness in this variable, we assume that anyone who has missing data for this variable does not drink. This means that we are likely to under-estimate the number of children who drink.

# drinking by age in 2017
read_2017(root = "X:/") %>%
  clean_age %>%
  clean_demographic %>%
  alc_drink_now_allages %>%
  alc_weekmean_adult %>%
  alc_sevenday_adult %>%
  alc_sevenday_child %>%
  filter(age < 90, age >= 13) %>%
  group_by(age_cat, sex) %>% 
  count(drinks_now) %>% 
  filter(drinks_now == "drinker") %>%
  ggplot(aes(x = age_cat, y = n, shape = sex, colour = sex)) +
  geom_point(size = 3, alpha = .5) +
  facet_wrap(~ sex, nrow = 1) +
  theme_minimal() +
  ylab("number of observations")
# drinking by age in 2017
read_2017(root = "X:/") %>%
  clean_age %>%
  clean_demographic %>%
  alc_drink_now_allages %>%
  alc_weekmean_adult %>%
  alc_sevenday_adult %>%
  alc_sevenday_child %>%
  filter(age < 90, age >= 13) %>%
  mutate(weekamount = ifelse(age %in% 13:15, total_units7_ch, weekmean)) %>%
  group_by(age_cat, sex) %>% 
  summarise(av_amount = mean(weekamount, na.rm = T)) %>% 
  ggplot(aes(x = age_cat, y = av_amount, colour = sex, shape = sex)) +
  geom_point(size = 3, alpha = .5) +
  facet_wrap(~ sex, nrow = 1) +
  theme_minimal() +
  ylab("expected number of units drunk in a week")


dosgillespie/hseclean documentation built on May 2, 2020, 1:15 a.m.