In CfSOtago/GREENGridData: Processing NZ GREEN Grid project data to create a 'safe' version for data archiving and re-use

knitr::opts_chunk$set(echo = FALSE) # by default turn off code echo

# Set start time ----
startTime <- proc.time()

# Local parameters ----

b2Kb <- 1024 #http://whatsabyte.com/P1/byteconverter.htm
b2Mb <- 1048576
plotLoc <- paste0(ggrParams$repoLoc, "/docs/plots/") # where to put the plots


# Local functions ----

\newpage

About

Report circulation:

Public – this report is intended to accompany the data release.

License

Citation

History

this report's edit history

Support

\newpage

Introduction

This report provides an overview of the GREEN Grid project [@stephenson_smart_2017] research data.

Data Package

Version r params$version of the data package contains:

powerData.zip: 1 minute power demand data for each circuit in each household. One file per household;
ggHouseholdAttributesSafe.csv.zip: anonymised household attribute data;
checkPlots.zip:
- simple line charts of mean power per month per year for each circuit monitored for each household. These are a useful check;
- tile plots (heat maps/carpet plots) of the number of observations per hour per day. Also a useful check...

Study recruitment

hhDT <- data.table::as.data.table(readr::read_csv(ggrParams$hhAttributes))

gsStatDT <- data.table::as.data.table(readr::read_csv(paste0(ggrParams$statsLoc, "hhStatsByDate.csv")))

The project research sample comprises r nrow(hhDT) households who were recruited via the local power lines companies in two areas: Taranaki starting in May 2014 and Hawkes Bay starting in November 2014.

Recruitment was via a non-random sampling method and a number of households were intentionally selected for their 'complex' electricity consumption (and embedded generation) patterns and appliances [@dianaThesis2015; @stephenson_smart_2017; @jack_minimal_2018; @suomalainen_detailed_2019].

The lines companies invited their own employees and those of other local companies to participate in the research and ~80 interested potential participants completed short or long forms of the Energy Cultures 2 household survey [@ec2Survey2015]. Households were then selected from this pool by the project team based on selection criteria relevant to the GREEN Grid project. These included:

having the majority of their energy supply from electricity (i.e. not gas heating);
household size;
types of appliances owned.

After informed consent was obtained from each household, an electrician contracted by the two lines companies completed an appliance survey to record detailed information about the appliances in each house. This survey contained information about the number of appliances owned, brand, model number, efficiency and age. The electrician also installed the GridSpy units which recorded electricity power demand at a circuit level. The GridSpy units automatically upload the monitoring data to the GridSpy company's secure database from where it was downloaded by the GREEN Grid research team.

As a result of this process the sample cannot be assumed to represent the population of customers (or employees) of any of the companies involved, nor the populations in each location [@stephenson_smart_2017].

Table \@ref(tab:sampleTable) shows the number in each sample.

t <- hhDT[, .('n Households' = .N), keyby = .(Location)]

kableExtra::kable(t, caption = "Sample location") %>%
  kable_styling()

Table \@ref(tab:sampleSurveys) shows the number for whom valid appliance and survey data is available in this data package. Note that even those which appear to lack appliance data may have sufficient survey data to deduce appliance ownership (see question numbers Q19_* and Q40_*).

t <- hhDT[, .(`n Households` = .N), keyby = .(Location, hasShortSurvey, hasLongSurvey, hasApplianceSummary)]

kableExtra::kable(t, caption = "Sample information") %>%
  kable_styling()

Data collection duration

Figure \@ref(fig:liveDataHouseholds) shows the total number of households for whom GridSpy data exists on a given date by sample. The plot includes any data, including partial data and suggests that for analytic purposes the period from April 2015 to March 2016 (indicated) would offer the maximum number of households.

setkey(gsStatDT, linkID)
dt <- merge(gsStatDT, hhDT, allow.cartesian=TRUE)

plotDT <- dt[, .(nHH = uniqueN(hhID)), keyby = .(date, Location)]

# point plot ----
myCaption <- paste0("Source: GREEN Grid Project data ", min(dt$date), " to ", max(dt$date))
rectAlpha <- 0.3
vLineAlpha <- 0.8
vLineCol <- "#0072B2" # http://www.cookbook-r.com/Graphs/Colors_(ggplot2)/#a-colorblind-friendly-palette

yMin <- min(plotDT$nHH)
yMax <- 2 * max(plotDT$nHH)

ggplot2::ggplot(plotDT, aes( x = date, y = nHH, fill = Location)) +
  geom_col() +
  scale_x_date(date_labels = "%Y %b", date_breaks = "6 months") +
  theme(axis.text.x = element_text(angle = 90, 
                                   vjust = 0.5, 
                                   hjust = 0.5)) + 
  labs(title = "N live households per day for all clean GridSpy data",
       caption = paste0(myCaption, "\nShaded area indicates 12 month period with largest number of households"),
       y = "Number of households"

  ) + 
  annotate("rect", xmin = as.Date("2015-04-01"),
             xmax = as.Date("2016-04-01"), 
             ymin = yMax, ymax = yMin, alpha = rectAlpha, fill = vLineCol)

Key attributes

Table \@ref(tab:allHhData) shows key attributes for the recruited sample. Note that two GridSpy monitors were re-used and so require new hhIDs to be set from the date of re-use using the linkID variable. This is explained in more detail in the GridSpy processing report. Linkage between the survey and GridSpy data should therefore always use linkID to avoid errors.

keyCols <- c("hhID", "linkID", "Location", "surveyStartDate", "nAdults", "nChildren0_12","nTeenagers13_18",
                           "notes", "r_stopDate", "hasApplianceSummary")
kableExtra::kable(hhDT[, ..keyCols ][order(linkID)], 
             caption = "Sample details") %>%
  kable_styling()

Code examples

We have provided a number of code examples for suggestions on how to load, further process and analyse the data.

Known issues

We maintain a known data issues list via our GitHub repository. If you think there is a data issue please check the repo list first and then add a new one if appropriate.

Runtime

t <- proc.time() - startTime

elapsed <- t[[3]]

Analysis completed in r round(elapsed,2) seconds ( r round(elapsed/60,2) minutes) using knitr in RStudio with r R.version.string running on r R.version$platform.

R environment

R packages used

base R [@baseR]
bookdown [@bookdown]
GREENGridData [@GREENGridData] which depends on:
- data.table [@data.table]
- dplyr [@dplyr]
- hms [@hms]
- lubridate [@lubridate]
- progress [@progress]
- readr [@readr]
- readxl [@readxl]
- reshape2 [@reshape2]
ggplot2 [@ggplot2]
kableExtra [@kableExtra]
knitr [@knitr]
rmarkdown [@rmarkdown]

Session info

sessionInfo()

References

CfSOtago/GREENGridData documentation built on June 10, 2022, 8:03 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

CfSOtago/GREENGridData
Processing NZ GREEN Grid project data to create a 'safe' version for data archiving and re-use

In CfSOtago/GREENGridData: Processing NZ GREEN Grid project data to create a 'safe' version for data archiving and re-use

About

Report circulation:

License

Citation

History

Support

Introduction

Data Package

Study recruitment

Data collection duration

Key attributes

Code examples

Known issues

Runtime

R environment

R packages used

Session info

References

R Package Documentation

Browse R Packages

We want your feedback!

CfSOtago/GREENGridData Processing NZ GREEN Grid project data to create a 'safe' version for data archiving and re-use

In CfSOtago/GREENGridData: Processing NZ GREEN Grid project data to create a 'safe' version for data archiving and re-use

About

Report circulation:

License

Citation

History

Support

Introduction

Data Package

Study recruitment

Data collection duration

Key attributes

Code examples

Known issues

Runtime

R environment

R packages used

Session info

References

R Package Documentation

Browse R Packages

We want your feedback!

CfSOtago/GREENGridData
Processing NZ GREEN Grid project data to create a 'safe' version for data archiving and re-use