knitr::opts_chunk$set(echo = FALSE) # by default turn off code echo
# Set start time ----
startTime <- proc.time()

# Local parameters ----

b2Kb <- 1024 #http://whatsabyte.com/P1/byteconverter.htm
b2Mb <- 1048576
plotLoc <- paste0(ggrParams$repoLoc, "/docs/plots/") # where to put the plots


# Local functions ----

\newpage

About

Report circulation:

License


Citation


History


Support


\newpage

Introduction


This report provides an overview of the GREEN Grid project [@stephenson_smart_2017] research data.

Data Package

Version r params$version of the data package contains:

Study recruitment

hhDT <- data.table::as.data.table(readr::read_csv(ggrParams$hhAttributes))
gsStatDT <- data.table::as.data.table(readr::read_csv(paste0(ggrParams$statsLoc, "hhStatsByDate.csv")))

The project research sample comprises r nrow(hhDT) households who were recruited via the local power lines companies in two areas: Taranaki starting in May 2014 and Hawkes Bay starting in November 2014.

Recruitment was via a non-random sampling method and a number of households were intentionally selected for their 'complex' electricity consumption (and embedded generation) patterns and appliances [@dianaThesis2015; @stephenson_smart_2017; @jack_minimal_2018; @suomalainen_detailed_2019].

The lines companies invited their own employees and those of other local companies to participate in the research and ~80 interested potential participants completed short or long forms of the Energy Cultures 2 household survey [@ec2Survey2015]. Households were then selected from this pool by the project team based on selection criteria relevant to the GREEN Grid project. These included:

After informed consent was obtained from each household, an electrician contracted by the two lines companies completed an appliance survey to record detailed information about the appliances in each house. This survey contained information about the number of appliances owned, brand, model number, efficiency and age. The electrician also installed the GridSpy units which recorded electricity power demand at a circuit level. The GridSpy units automatically upload the monitoring data to the GridSpy company's secure database from where it was downloaded by the GREEN Grid research team.

As a result of this process the sample cannot be assumed to represent the population of customers (or employees) of any of the companies involved, nor the populations in each location [@stephenson_smart_2017].

Table \@ref(tab:sampleTable) shows the number in each sample.

t <- hhDT[, .('n Households' = .N), keyby = .(Location)]

kableExtra::kable(t, caption = "Sample location") %>%
  kable_styling()

Table \@ref(tab:sampleSurveys) shows the number for whom valid appliance and survey data is available in this data package. Note that even those which appear to lack appliance data may have sufficient survey data to deduce appliance ownership (see question numbers Q19_* and Q40_*).

t <- hhDT[, .(`n Households` = .N), keyby = .(Location, hasShortSurvey, hasLongSurvey, hasApplianceSummary)]

kableExtra::kable(t, caption = "Sample information") %>%
  kable_styling()

Data collection duration

Figure \@ref(fig:liveDataHouseholds) shows the total number of households for whom GridSpy data exists on a given date by sample. The plot includes any data, including partial data and suggests that for analytic purposes the period from April 2015 to March 2016 (indicated) would offer the maximum number of households.

setkey(gsStatDT, linkID)
dt <- merge(gsStatDT, hhDT, allow.cartesian=TRUE)

plotDT <- dt[, .(nHH = uniqueN(hhID)), keyby = .(date, Location)]

# point plot ----
myCaption <- paste0("Source: GREEN Grid Project data ", min(dt$date), " to ", max(dt$date))
rectAlpha <- 0.3
vLineAlpha <- 0.8
vLineCol <- "#0072B2" # http://www.cookbook-r.com/Graphs/Colors_(ggplot2)/#a-colorblind-friendly-palette

yMin <- min(plotDT$nHH)
yMax <- 2 * max(plotDT$nHH)

ggplot2::ggplot(plotDT, aes( x = date, y = nHH, fill = Location)) +
  geom_col() +
  scale_x_date(date_labels = "%Y %b", date_breaks = "6 months") +
  theme(axis.text.x = element_text(angle = 90, 
                                   vjust = 0.5, 
                                   hjust = 0.5)) + 
  labs(title = "N live households per day for all clean GridSpy data",
       caption = paste0(myCaption, "\nShaded area indicates 12 month period with largest number of households"),
       y = "Number of households"

  ) + 
  annotate("rect", xmin = as.Date("2015-04-01"),
             xmax = as.Date("2016-04-01"), 
             ymin = yMax, ymax = yMin, alpha = rectAlpha, fill = vLineCol)

Key attributes

Table \@ref(tab:allHhData) shows key attributes for the recruited sample. Note that two GridSpy monitors were re-used and so require new hhIDs to be set from the date of re-use using the linkID variable. This is explained in more detail in the GridSpy processing report. Linkage between the survey and GridSpy data should therefore always use linkID to avoid errors.

keyCols <- c("hhID", "linkID", "Location", "surveyStartDate", "nAdults", "nChildren0_12","nTeenagers13_18",
                           "notes", "r_stopDate", "hasApplianceSummary")
kableExtra::kable(hhDT[, ..keyCols ][order(linkID)], 
             caption = "Sample details") %>%
  kable_styling()

Code examples

We have provided a number of code examples for suggestions on how to load, further process and analyse the data.

Known issues

We maintain a known data issues list via our GitHub repository. If you think there is a data issue please check the repo list first and then add a new one if appropriate.

Runtime

t <- proc.time() - startTime

elapsed <- t[[3]]

Analysis completed in r round(elapsed,2) seconds ( r round(elapsed/60,2) minutes) using knitr in RStudio with r R.version.string running on r R.version$platform.

R environment

R packages used

Session info

sessionInfo()

References



CfSOtago/GREENGridData documentation built on June 10, 2022, 8:03 p.m.