If you have never worked on primer.data before, start by skimming R Packages, but read closely the data section. Also, make sure to install the usethis and pkgdown packages.
The basic process of creating new data sets is always the same. We get a raw data set, clean it up, and finally create a help page with details for users. Follow these steps:
Get a raw data set and and put it into /data-raw. If it is too large to push it to Github, compress the file.
Create a .R file in /data_raw and name it "make_" + the name of the final data set. This will be the script we use to prepare the raw data. Look at make_nominate.R to see how these files should look like. The general structure is
name_of_final_dataset <- x usethis::use_data(name_of_final_dataset, overwrite = TRUE)
The clean data set should have been saved in /data. Check if you see an .rda file with the name of your final data set.
A few things to consider when following the above steps:
The name of the final data set should appear in the name of both key files you're creating. For nominate, we would have make_nominate.R (in /data-raw) and nominate.R (in /R). We also call any article which is associated with the data, and placed in inst/papers, "nominate.pdf".
Think hard about variable type.
No ordered factors. If something has no natural ordering, make it character. If it does have an ordering, make it a factor and give the levels the ordering you want.
Binary variables should generally be parsed as character variables, e.g. "Yes"/"No". Only make it 0/1 if we expect people to use it as an outcome variable.
Document your reasoning in your make_dataset.R script. Examine the choices made in make_mail.R.
Github allows you to create web pages with a username.github.io/repo URL. As an example, the website for this package is accessible at ppbds.github.io/primer.data. (Our "user" here is an organisation called PPBDS.)
The first step is to create a branch for your repo named "gh-pages". This cannot be done manually, but Github will do it for us if we activate automated checking with Github Actions. To do this, create a folder named ".github" in your root directory, and within this folder, create another one named "workflows". Go to "workflows" and create a file named "R-CMD-check.yaml". Go to https://github.com/ppbds/primer.data/blob/master/.github/workflows/R-CMD-check.yaml, copy the content you see, and paste into the empty .yaml file you just created. Save, commit, and push to Github. If everything went well, Github Actions should have been activated, and Github has automatically created a branch named "gh-pages" in your repo.
We can now activate Github Pages. First, go to your repo on Github. Go to "Settings" and scroll down to the "Github Pages" section. Click on the button that says "None" and select "gh-pages". Optionally, you can then select a folder within that repository. The options are either /(root), which is simply the root directory, or /docs, where Github will look for a folder called "docs" within the selected repo. If you don't already have a "docs" folder, just stick with /(root). Once you have selected a branch and folder, click the "Save" button. You should then see a grey bar in the Github Pages section with the message "Your site is ready to be published at https://username.github.io/repo". This means that the site has not yet been published. Wait a few minutes, refresh the page, and check again. The bar should now be green with the message "Your site is published at https://username.github.io". The page can now be accessed by anyone. Once we change files in the repo that affect the web page's appearance, Github Pages automatically updates the page. Until the page has been updated, which can indeed take some time, the settings will again show a grey bar with "Your site is ready to be published at https://username.github.io/repo" again. Once the bar turns green, the website will show the new version.
Setting up Github Pages is only the first step. Since our webpage can only show what is in the repo, there are a few things to consider when trying to make changes.
The webpage consist of multiple tabs, each of which is generated from a different file. The homepage is created from README.md. If you want to update it, you must change something in README.Rmd and then knit the file to generate the corresponding .md file. The "References" tab is generated from the .Rd files that create the help pages. Any changes in the help pages are also reflected in "References". The "Changelog" tab comes from NEWS.md, and unlike for README.md, there is no .Rmd file behind it. While we could add one, there is not much content in there, so directly editing the .md is usually faster. There are a bunch of other files that impact how different parts of the page look like, but we typically do not edit them. The only exception is the package version, which can be changed through the DESCRIPTION file. This process is explained in the next section below.
Note that there is a function, pkgdown::build_site()
, which uses the files in your local repo to create the complete webpage as it would look like. This is only to confirm that everything looks OK before you push it, but it is not necessary for the new page to show up on Github Pages. (Earlier, we though it is.)
This section provides an overview of how to name package versions (see also https://r-pkgs.org/release.html#release-version). The development state of each package has a unique numbering such as "Version 1.2.3.9000". The version name usually consists of three or four digits, each of which stands for a specific type of change:
"Major releases” are indicated by the first digit, or “1” in our example above. This number is incremented if we make changes to the package that are not backward compatible and may affect many people. An example would be if we delete an entire data set or remove variables from an existing data set. In other words, things that cause major problems in someone's code who is already using our package - try to avoid such changes whenever possible!
"Minor releases” are indicated by the second digit, or “2” in our example. This number is incremented when we make bug fixes, add new features or make backward-compatible changes. This is the most common type of change, for instance when we add new data sets.
"Patches” are indicated by the third digit, or “3” in our example. This number is incremented if we fix bugs without adding relevant new features, e.g. when making changes to help files.
In-development packages, i.e. packages which are not yet released on CRAN (like ours), also have a fourth digit. The default value is usually set to 9000, as in the example above. This number is only incremented when we add important features that another in-development package should depend on. Since this is unlikely to be the case, we can leave the fourth digit at 9000 as long as primer.data has not been published on CRAN (from then on only the first three digits will be used).
No other digit, but a legitimate question is also: When do we change from 0.a.b to 1.0.0? The first digit should be set from 0 to 1 when our package is "feature complete". This could be the case if our package contains all necessary data sets a student needs in GOV 50. It could also mean that the package is in a state to be released on CRAN (or both). In the end, it's up to us to answer this question. Furthermore, note that as long as the first digit is still 0, non-backward compatible changes will be recorded by incrementing the second digit (and not the first, as described above).
Once we have made some changes and agreed on a proper name for the new version, we need to change the old name. First, we have to open the DESCRIPTION file and manually enter the correct version. Save the changes and run build_site() in the console the see how it looks like. The new name of the version should now show on top of the homepage. Next, update the changelog which is generated by the NEWS.md file. Simply open the file and orientate yourself on the existing content to add a new entry. If everything looks right, save the changes and use build_site() again to see the result. While there is no rule on this, most packages mention the fourth digit only in the DESCRIPTION file and not in NEWS.md. (I follow this convention as well.) Whenever new changes are made, simply repeat the described process.
Follow these steps:
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.