knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
{manypkgs}
has package related functions that help developers contribute to the ecosystem of
packages by helping with the set-up of packages consistent with {manydata}
.
The function setup_package()
, for example, establishes many of the required files and folder
structures for a {manydata}
-consistent data package.
It requires that the package name starts with the prefix 'many' as well as that
the name of one author is declared.
For author's name, please declare surname(s) first and then given name(s) separated by commas.
The function accepts ORCID numbers instead of author names as well.
manypkgs::setup_package("manystates", name = "Hollway, James")
For adding on package contributors or authors to packages later on,
use the add_author()
function.
This function also accepts ORCID numbers instead of author names.
manypkgs::add_author(name = "Sposito, Henrique")
For automatically updating packages for changes in templates, documentation,
and file structure, contributors can use the update_package()
function.
manypkgs::update_package("manystates")
For data collection and loading into a new package with {manypkgs}
,
the function import_data()
creates a data-raw folder into the package,
copies the raw data into the folder and provides a script to facilitate data loading.
The function requires that a name for the dataset - two dimensional data sheet being
imported - is declared, in upper case letters, as well as the name of the issue-domain,
in lower case letters, the dataset belongs.
Note that for transparency and citation purposes, contributors need to provide
a .bib
file alongside their data in the corresponding data-raw subfolder.
The domain issue name will be used to connect two-dimensional datasets into
three-dimensional databases that resembles a data cube (a list object where entries are tibbles).
manypkgs::import_data("COW", "states")
Once a dataset is collected with import_data()
, it needs to be corrected so that it can be
connected into a database. {manydata}
contains several functions geared towards data wrangling.
These include, for example, the transmutate()
function which is between {dplyr}
's
transmute and mutate but that returns only the mutated variables and none of the variables
used in the mutations.
Furthermore, standardise_title()
function capitalises all words in a string and removes
unnecessary symbols to enable comparison. Finally, {manydata}
data preparation workflows also
leverage {messydates}
to correct uncertain dates.
These include, for example, the messydates::make_messydate()
function which creates nested
vectors of dates for vague date inputs into a range of dates.
COW <- manydata::transmutate(ID = stateabb, Beg = messydates::make_messydate(styear, stmonth, stday), End = messydates::make_messydate(endyear, endmonth, endday), Label = manypkgs::standardise_titles(statenme), COW_Nr = manypkgs::standardise_titles(as.character(ccode)))
Contributors should make sure the variables names in the datasets
are in line with {manydata}
ecosystem.
Datasets part of the same database should have similar variables
which allows to build up the 'datacube' for comparison.
Once the dataset has been corrected in {manydata}
,
the cleaned dataset can be loaded with {manypkgs}
into the package using
the export_data()
function which take as inputs the dataset name and
the database name as a string.
Corrected datasets can be connected to form a three dimensional database
that resembles a data cube.
The website source of the data should be indicated in the third argument of the function.
The function creates a database in a data folder to store corrected datasets,
while providing a citation template and a test template.
The citation template helps cite the source of the raw data and and describe
the variables in the corrected dataset.
The test template makes sure the corrected dataset is consistent with {manypkgs}
guidelines.
manypkgs::export_data(COW, "states", URL = "https://correlatesofwar.org/data-sets/state-system-membership")
The test template is meant to ensure consistency for the corrected dataset.
If these tests fail, users can go back to correcting the data and then re-run export_data()
.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.