{css, setup, echo = FALSE, include = FALSE}
.info {
display: block;
width: 100%;
font-weight: bold;
background: #e6ffff;
margin-left: auto;
margin-right: auto;
padding: 20px 25px;
border-width: 3px;
border-style: solid;
border-color: #1695A3;
border-radius: 20px;
/*box-shadow: 10px 10px 20px -5px gray;*/
}
The broad idea of the deployment strategy is encompassed in three rules:
- No knowledge beyond markdown and github is required to create a lesson website
- Any elements of the lesson that need to be rendered will be rendered via continuous integration
- All changes to the final site are purposeful
The process for adding an episode to a lesson should take three steps, total:
episodes/
folderconfig.yaml
filesandpaper::build_lesson()
We have decided on using the R Markdown format because it is a VERY powerful tool that allows you to write code and text without having to copy and paste output/images/tables. It has support for several languages including python, BASH, and SQL. Most importantly: the code rendering features of it are purely optional; you can strip away all R-associated content and be left with a markdown document.
This is a system that has already been used for our R lessons for several years and works fairly well.
To preview the lessons, we build the static files locally, without any servers. While it is possible to go directly from R Markdown to HTML, we use a two-step process:
{alt='diagram of three folders. The first folder, "episodes/", labelled as RMarkdown, has an arrow (labelled as hash episodes) pointing to "site/built/", labelled as Markdown. The Markdown folder has an arrow (labelled as "apply template") pointing to "site/docs/", labelled as "HTML". The first folder is labelled in pale yellow, indicating that it is the only one tracked by git.'}
Now, the first reaction you may have is, "Why do we need a two step process when we can generate HTML directly from R Markdown?" The answer is three-fold.
First, if there is R code in the R Markdown document, it takes time to execute the code and add it in to the output document. By having an intermediate markdown document that contains a hash of the original document, we can make sure that we only build the files that have changed in content with minimal overhead in a manner that is compatible with R Markdown's chunk caching feature.
Second, if any generated element of your page creates invalid HTML, it usually causes a cascading effect that can be hard to track down if you are looking at raw HTML output. Having a markdown intermediate allows us to more easily single out any potential problems. This particular reason is important in the next section for scheduled deployments where we avoid problems raised explicitly by checking the output.
Third, having the rendered markdown documents makes them easy to use with other HTML or pandoc templates not provided by The Carpentries.
You might also be wondering "why aren't the rendered markdown and HTML files being tracked by git?" The answer is that, they will be, just not on your computer.
{alt='Diagrammatic representation of the GitHub deployment cycle showing four branches, gh-pages, md-outputs, main, and my-edit. The my-edit branch is a direct descendent of the main branch, while the gh-pages and md-outputs branches are orphans. Each commit of the main branch has a process represented by a dashed arrow that builds a commit of the subsequent orphan branches'}
Because some lesson content will be auto-generated from an ecosystem of software that is constantly evolving, it's important to make sure a few things are true:
Because we have two sources of entropy (software environment needed to build the lesson content and CSS/HTML/JS needed to create the website), it's important to separate these steps because if there is a failure, having them separate allows us to audit the failure and fix the issue on the appropriate end. Thus, we use the same two-step system we presented earlier in local rendering, with weekly and monthly checks:
Zooming in on the Pull request workflow we can see that there is only one point of entry. The box elements represent user interactions, the ellipses are GitHub workflows. The color indicates permission levels for the workflows. Lavender workflows only have the ability to check out code and compare it. Wheat/yellow colored workflows have full permissions to push to the repository:
Because we only track the source files of the lesson and not the output from
the maintainer's computer, we need to rely on Continuous Integration to rebuild
the lesson and deploy it to the cloud. We still use the two-step process here,
but the difference is that we don't want to make any extra commits to the main
branch, so instead of creating the commits in two extra directories, we create
them in orphan branches called md-sources
and gh-pages
. The latter is
familiar to most lesson maintainers, and the former serves as a staging and
evaluation area for changes in generated content.
Takeaway from this diagram: Yes, this is complex, but its modularity ensures that we can replace components without needing to significantly modify any of the other components.
The diagram above describes the dependency graph of the lesson template. Each component in the graph represents either a file (beige files/folders/documents) or tools (boxes). The edges (lines connecting the components) are categorized with the following labels:
depends ~ (e.g. a -> b means component a depends on component b). This is usually reserved for lesson artifacts such as generated markdown and HTML. If a component is dependent on another component, that means that it can not exist without the upstream component. Changes in the upstream component can happen independently to the downstream component, but these changes will affect the state of the downstream component.
uses ~ (e.g. a -> b means component a uses aspects of component b). In this relationship, each of these components are independent and they can change independent of one another without necessarily affecting the state of the other.
contains ~ (e.g. a -> b means component a contains component b). A component that contains another component must change when the contained component changes. For example, any changes to the package cache lockfile requires a corresponding commit in the source files.
modifies ~ (e.g. a -> b means component a modifies component b). Modification occurs when a component is the agent of change for a downstream component, but changes in the upstream component do not necessarily correspond to a change in the downstream component. For example the GitHub Actions component is the only one on this graph that can modify other components (including itself), but if it were to undergo a change to update frequency, that would not necessarily affect the changes that the downstream components see.
Each time a commit is pushed to the main branch, the command
sandpaper::ci_deploy()
is run in GitHub actions and it will render any
changed source files from main
to md-source
and then apply the HTML
template, {varnish} to the files in
md-source
to create the website.
This process assumes that pushes to the main branch will not break the rendering process as they will either be cosmetic changes or changes that come from pull requests (which have previously been vetted). No extra maintainer interaction required.
{alt='diagram showing five branches, two of which are highlighted: "md-outputs", and "md-outputs-PR-1" where the latter is an orphan branch starting from the last commit of the former and applying the changes from the "my-edit" branch on top, showing that the output has changed'}
Each time a pull request is added, the maintainer should be informed of what changes in the lesson (if anything), so the process goes like this:
md-sources
called
md-sources-{pr#}
.md-sources-{pr#}
md-sources-{pr#}
and md-sources
is added to the PR
comments as a link. md-sources-{pr#}
branch (effectively, a squash merge){alt='diagram showing four branches, where "my-edit" is merged into "main" and that drives the two-step process of rendering "md-outputs" and then "gh-pages". The changes from the "my-edit" branch are reflected in the "md-outputs" branch output.'}
As I mentioned above, the HTML templates will not live within the lesson
repository. They will live in the
{varnish} package and applied to the
rendered markdown files. This step will start at the md-source
branch and
re-apply a fresh version of the template to bring in any stylistic changes that
have occurred in the {varnish} repository (e.g. new CSS rules) in the previous
week.
This step will test that our HTML template is working correctly. If an error occurs, we know that we need to fix the HTML template and not the lesson. No maintainer interaction required.
Because the weekly CRON jobs ensure that the HTML template works, we run the monthly jobs to make sure the software stack continues to work as expected. This is important for making sure that the generated output matches the output that the learners would see and that the narrative additionally matches the output.
This will explicitly test changes in the lesson due to changes in the software stack and not the HTML template. Because many of the changes will produce technically valid output, Manual inspection of the diffs is required. There are six steps in this process, one of which requires a maintainer's input:
main
with the format YYYY-MM
.YYYY-MM
is reset and a Pull Request is created. Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.