This R package is intended as a light-weight solution to the following problem: You can sync code to a git remote, but do not want to sync the actual data. It creates a list of the contents of a data subfolder, writes this to a csv file which can be synced to git remotes, with functions to check for mismatches between this .csv and the actual data folder contents.
datafolder_update()
creates a list of the file names and their md5 hashes for all files in a data folder (and all subfolders), and writes this to a .csv file (default is docs/data_folder_content.csv). This csv file can then be synced to git remotes - this way although the data itself is not synced, the data files expected by the code at each commit is documented.
When pulling the remote we can check that the actual contents of out local repository data folder matches this list. datafolder_check()
prints any mismatches in some detail - which files are missing, which files appear in data/ but not the csv list, which have been renamed and so on - so you can manually copy the data to match what the code expects through some other secure channel. If run with datafolder_check(stop_on_error = FALSE)
raises a warning when mismatches occur instead of an error.
This doesn't work very well with projects where the data is frequently updated in which case an alternative solution is probably appropriate. These functions assume that your R project is structured with a main project working directory, and all data is located within a specific subfolder inside this project (i.e. data/), and there is another folder for general documentation / configuration (i.e. docs/). For an example structure see http://projecttemplate.net/architecture.html
datafolder_update()
: Write docs/data_folder_content.csv listing the files in data/ and their md5 hashes
datafolder_check()
: Check docs/data_folder_content.csv against the actual data/ files and list any mismatches
datafolder_update()
to generate docs/data_folder_content.csvdatafolder_check()
to make sure everything is up to date, and use the printed output to manually copy missing or changed data filesdatafolder_update()
if you've added new data files to be trackedhttps://github.com/bosefalk/datafolder
devtools::install_github("bosefalk/datafolder")
Maintainer: Bose Falk falk@dkms.de
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.