knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
shsannualreport is an R package to prepare the results of the Scottish Household Survey(SHS) for publication as an R Shiny app. The package allows users to prepare source data for publication, and to then build an R Shiny app based on the processed data.
You can install the development version of shsannualreport from GitHub with:
# install.packages("devtools") devtools::install_github("thomascrines/shsannualreport")
If you are working on SCOTS, or if the above does not work, you can install manually:
install.packages("C:/DownloadDirectory/shsannualreport-master/shsannualreport-master", repos = NULL, type="source", lib = "C:/YourLibraryPath")
Library path can be seen by running .libPaths()
The package depends on having source data which is currently stored in a secure folder on Objective Connect.
If this data has not yet been published, it won't be available outside the SHS team.
If the data has been published, please contact SHS@gov.scot to request access to the source data.
To prepare source data prior to running the functions in the package:
There are three public functions in the package:
shs_create_names_workbooks
- Creates two Excel workbooks, column_names.xlsx
and variable_names.xlsx
, with names as they appear in the source data and values to rename them to as they will appear in the final app.shs_create_app_data
- Extracts all data and metadata from Excel sources, processes the data, and saves in Rds format suitable for use in the SHS Data Explorer app.shs_create_app
- Copies or creates all necessary files for the SHS Data Explorer app, including the data output by shs_create_app_data. The resulting app can then be run locally or deployed.shs_create_names_workbooks
outputs two files, column_names.xlsx
and variable_names.xlsx
, to the specified destination_directory
. Both sheets will have the same stucture, with two columns: source_name
and display_name
. The source_name
column lists all names as they appear in the source data, the display_name
column can be populated with the desired names to appear in the app.
In column_names.xlsx
, source_name
will be populated with all column names in all files in the source_dataset_directory
, minus any names specified in columns_to_remove
.
In variable_names.xlsx
, source_name
will be populated with all values in the second column on all files in the source_dataset_directory
. Here is important that the columns_to_remove
values are correct, as the source data can contain unnecessary columns which can result in the output being populated with values from these columns rather than the desired columns.
In both output files, the display_name
value can be auto-populated with previous values, if a path to an existing template file is provided to the argument existing_column_names_path
and/or existing_variable_names_path
.
In total, shs_create_names_workbooks
accepts five arguments:
destination_directory
: The path of the directory to write the column names and variables names Excel files to. (Must be an existing directory.)source_dataset_directory
: The path of the directory containing source dataset in Excel format. (This should be the sub-directory named dataset
following the structure described above.)columns_to_remove
: A list of unnecessary columns to remove from the dataset.existing_column_names_path
(optional): A path to an existing Excel file with the same structure. Any 'display_name' values will be copied to the new output file.existing_variable_names_path
(optional): A path to an existing Excel file with the same structure. Any 'display_name' values will be copied to the new output file.Once the output files are created, the display_name
columns of both output files can be manually edited to the desired values.
NOTE: some special characters can run locally, but break when deployed due to encoding differences, e.g £
. There is currently no fix in the package for these characters, so they will need to be changed to something else when deploying to shinyapps.io.
shs_create_app_data
takes a source dataset and metadata in Excel format, processes the data to make it suitable for the SHS Data Explorer app, and saves the output files in Rds format.
The processing done in shs_create_app_data
includes:
source_data_directory
columns_to_remove
from the datasetcolumn_names_workbook_path
and variable_names_workbook_path
respectivelyThere are also a number of other processing functions which are described in more detail in the vignette (to do), including:
In total shs_create_app_data
accepts five arguments:
destination_directory
: The path of the directory to write the output dataset to. (Must be a pre-existing directory.)source_data_directory
: The path of the directory containing source data in Excel format. (This should be the top level directory which contains sub-directories dataset
and metadata
following the structure described above.)columns_to_remove
: A list of unnecessary columns to remove from the dataset.column_names_workbook_path
: A path to an Excel workbook with details of the source names and display names of columns in the source dataset. If you don't have a suitable file, shs_create_names_workbook
can be used to generate this file. variable_names_workbook_path
: A path to an Excel workbook with details of the source names and display names of variables in the source dataset. If you don't have a suitable file, shs_create_names_workbook
can be used to generate this file. When the data has been output to the destination_directory
it can then be used to create a version of the SHS Data Explorer app using shs_create_app
.
shs_create_app
will take data output by shs_create_app_data
and generate a Shiny app based on the contained data. The app will be written to the specified destination_directory
, from where it can be run locally, deployed etc.
In total shs_create_app_data
accepts five arguments:
destination_directory
: The path of the directory to create the final app in.data_directory
: The path of the directory containing all data and metadata created by shs_create_app_data
.reports_start_year
: The first year to allow users to generate reports for in the 'Generate Report' section of the app.reports_end_year
: The final year to allow users to generate reports for in the 'Generate Report' section of the app.The following example is based on usage in SCOTS, with the source data downloaded and set up according to the instructions in 'prerequisites' in the 'Downloads' folder.
To use the examples on other computers, all that needs changed are the folder paths, which will vary by user.
As the destination_directory
paths for both shs_create_names_workbooks
and shs_create_app_data
must exist before running, directories for the example I created directories 'names' and 'appdata' in 'Downloads'.
NOTE: Running functions with shsannualreport::
prefix alone seems not to load dependencies, running library(shsannualreport)
before running any functions will prevent errors.
library(shsannualreport) destination_directory <- "C:/Users/[user's U number]/Downloads/names" source_dataset_directory <- "C:/Users/[user's U number]/Downloads/data/dataset" columns_to_remove <- c("sort", "_LABEL_", "var", "LABEL") existing_column_names_path <- "C:/Users/[user's U number]/Downloads/old_column_names.xlsx" existing_variable_names_path <- "C:/Users/[user's U number]/Downloads/old_variable_names.xlsx" shs_create_names_workbooks(destination_directory = destination_directory, source_dataset_directory = source_dataset_directory, columns_to_remove = columns_to_remove, existing_column_names_path = existing_column_names_path, existing_variable_names_path = existing_variable_names_path)
NOTE: Although shs_create_names_workbooks
can be used to generate column and variable names re-naming spreadsheets, it doesn't have to be run if you already have these sheets. The outputs can be saved off and the paths passed to column_names_workbook_path
and variable_names_workbook_path
. This example uses the sheets output in the shs_create_names_workbooks
example above.
library(shsannualreport) destination_directory <- "C:/Users/[user's U number]/Downloads/appdata" source_data_directory <- "C:/Users/[user's U number]/Downloads/data" columns_to_remove <- c("sort", "_LABEL_", "var", "LABEL") column_names_workbook_path <- "C:/Users/[user's U number]/Downloads/names/column_names.xlsx" variable_names_workbook_path <- "C:/Users/[user's U number]/Downloads/names/variable_names.xlsx" shs_create_app_data(destination_directory = destination_directory, source_data_directory = source_data_directory, columns_to_remove = columns_to_remove, column_names_workbook_path = column_names_workbook_path, variable_names_workbook_path = variable_names_workbook_path)
NOTE: As above, if data has been previously generated by shs_create_app_data
and the source data hasn't changed, shs_create_app
can be run without re-running shs_create_app_data
.
library(shsannualreport) destination_directory <- "C:/Users/[user's U number]/Downloads" data_directory <- "C:/Users/[user's U number]/Downloads/appdata/data" reports_start_year <- 2013 reports_end_year <- 2019 shs_create_app(destination_directory = destination_directory, data_directory = data_directory, reports_start_year = reports_start_year, reports_end_year = reports_end_year)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.