Description Usage Arguments Details Value Examples
The data which underlies the Economic Sectors for DCMS sectors data is typically provided to DCMS as a spreadsheet from the Office for National Statistics. This function extracts the ABS data from that spreadsheet, and saves it to .Rds format.
IT IS HIGHLY ADVISEABLE TO ENSURE THAT THE DATA WHICH ARE CREATED BY THIS FUNCTION ARE NOT STORED IN A FOLDER WHICH IS A GITHUB REPOSITORY TO MITIGATE AGAINST ACCIDENTAL COMMITTING OF OFFICIAL DATA TO GITHUB. TOOLS TO FURTHER HELP MITIGATE THIS RISK ARE AVAILABLE AT https://github.com/ukgovdatascience/dotfiles.
1 | extract_ABS_data(path, sheet = "New ABS Data", format = 2015)
|
path |
Location of the input spreadsheet file. Named something like "working_file_dcms_VXX.xlsm". |
sheet |
The name of the spreadsheet in which the data are stored.
Defaults to |
format |
Specify which years format the data is in |
The best way to understand what happens when you run this function is to look at the source code, which is available at https://github.com/ukgovdatascience/eesectors/blob/master/R/. The code is relatively transparent and well documented. A brief explanation of what the function does here:
1. The function calls readxl::read_excel
to load the appropriate
page from the underlying spreadsheet.
2. In the 2016 version of the working file spreadhseet, there were a number of replicated columns. These are removed, however it is at present a relatively dumb exercise. The function simply looks for columns with teh same name, and retains only the first one.
3. The column names are cleaned to make selection of variables easier using
make.names
.
4. The data are pivoted into long form using tidyr::gather_
. This
converts the data from a wide dataframe with year as column headers, into a
long dataframe with year included in a year column. This makes the data
much easier to subset.
5. All the ABS values are combined into a column called ABS
. In the
2016 spreadsheet there were a number of full stops (.
) in the
ABS
column, which will be coerced to NA
when the the column
is converted to numeric using as.numeric
(the next step). The
internal function eesectors::integrity_check
runs a quick check to
make sure that the only NAs creeping into the ABS
column are from
full stops in the original data. The full stops are then converted to
zeros.
6. The internal function eesectors::clean_sic
is run on the
DOMVAL
column to ensure that all 3 and 4 digit SIC codes are
formatted properly.
7. The data are printed to console, and can be saved out using the normal
methods, for instance saveRDS
, or write.csv
.
The function returns nothing, but saves the extracted dataset to
file.path(output_path, 'OFFICIAL_ABS.Rds')
. This is an R data
object, which retains the column types which would be lost if converted to
a flat format like CSV.
1 2 3 4 5 6 7 8 | ## Not run:
library(eesectors)
extract_ABS_data(
path = 'OFFICIAL_working_file_dcms_V13.xlsm',
sheet_name = 'New ABS Data'
)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.