# Note to compile this file to README.mb, run the following: # rmarkdown::render('README.Rmd',output_format = 'md_document') knitr::opts_chunk$set( echo = TRUE, warning = FALSE, message = FALSE, error = FALSE )
First Statistical Release
This is a prototype and subject to constant development
This package provides functions used in the creation of a Reproducible Analytical Pipeline (RAP) for the Economic Estimates for DCMS sectors publication.
See the eesectorsmarkdown repository for an example of implementing these functions in the context of a Statistical First Release (SFR).
The package can then be installed using devtools::install_github('ukgovdatascience/eesectors')
.
Some users may not be able to use the devtools::install_github()
commands as a result of network security settings.
If this is the case, eesectors
can be installed by downloading the zip of the repository and installing the package locally using devtools::install_local(<path to zip file>)
.
This package provides functions to recreate Chapter three -- Gross Value Added (GVA) of the Economic estimates of DCMS Sectors.
The data are provided to DCMS as spreadsheets provided by the Office for National Statistics (ONS). Hence, the first set of functions in the package are designed to extract the data from these spreadsheets, and combine the data into a single dataset, ready to be checked, and converted into tables and figures.
There are four extract_
functions:
extract_ABS_data
extract_DCMS_sectors
extract_GVA_data
extract_SIC91_data
extract_tourism_data
Note: that with the exception of extract_DCMS_sectors
, the data extracted by these functions is potentially disclosive, and should therefore be handled with care and considered to be OFFICIAL-SENSITIVE. Steps must be taken to prevent the accidental disclosure of these data.
These should include (but not be limited to):
The extract functions will return a data.frame
, and can be called as follows (see individual function documentation for more information about each of the arguments).
# Where working_file.xlsm is the spreadsheet containing the underlying data input <- 'working_file.xlsm' extract_ABS_data(input)
The various datasets used in the GVA chapter can be combined with the combine_GVA()
function, which will return a data.frame
of the combined data.
combine_GVA( ABS = extract_ABS_data(input), GVA = extract_GVA_data(input), SIC91 = extract_SIC91_data(input), tourism = extract_tourism_data(input) )
The GVA chapter is built around the year_sector_data
class. To create a
year_sector_data
object, a data.frame
must be passed to it which contains
all the data required to produce the tables and charts in Chapter three.
An example of how this dataset will need to look is bundled with the package:
GVA_by_sector_2016
. These data were extracted directly from the 2016 SFR which
is in the public domain, and provide a test case for evaluating the data.
library(eesectors)
GVA_by_sector_2016
When an object is instantiated into the year_sector_data
class, a number of checks
are run on the data passed as the first argument. These are explained in more
detail in the help ?year_sector_data()
gva <- year_sector_data(GVA_by_sector_2016)
Any failed checks are raised as warnings, not errors, and so the user is able to continue.
However it is also possible to log these warnings as github issues by setting log_issues=TRUE
.
This is a prototype feature that needs additional work to increase the usefulness of these issues, see below for details on environmental variables that are required for this functionality to work.
Tables and charts for Chapter three can be reproduced simply by running the relevant functions:
year_sector_table(gva)
figure3.1(gva)
Note that figures produced remain ggplot2
objects, and can therefore be edited
in the following way:
p <- figure3.2(gva) p
Titles, and other layers can then be added simply:
library(ggplot2) p + ggtitle('Figure 3.2: Indexed growth in GVA (2010 =100)\n in DCMS sectors and UK: 2010-2015')
Note that figures make use of the govstyle package. See the vignette for more information on how to use this package.
In order to use this functionality, it is necessary to set the three following environmental variables:
|Name|Example|Description| |---|---|---| |GITHUB_PAT|_|A github personal access token with the necessary permissions.| |LOG_REPO|RAP-demo-md|The name of a github repository where data issues can be logged.| |LOG_OWNER|ukgovdatascience|The owner of the repository referred to in LOG_REPO.|
Environmental variables can be set interactively using Sys.setenv()
, or more
permanently by settin gteh variables in an .Renviron
file which will be
sourced when the project is loading (assuming you are using projects within
Rstudio).
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.