Advanced Topics

In this appendix, we talk about a few advanced topics that may be of interest to developers and advanced users.

More global options

There are a few more advanced global options\index{Global Options} in addition to those introduced in Section \@ref(global-options), and they are listed in Table \@ref(tab:global-options2).

knitr::kable(matrix(c(
  'blogdown.hugo.dir', '', 'The directory of the Hugo executable',
  'blogdown.method', 'html', 'The building method for R Markdown',
  'blogdown.publishDir', '', 'The publish dir for local preview',
  'blogdown.new_bundle', FALSE, 'Create a new post in a bundle?',
  'blogdown.widgetsID', TRUE, "Incremental IDs for HTML widgets?",
  NULL
), ncol = 3, byrow = TRUE, dimnames = list(NULL, c('Option name', 'Default', 'Meaning'))), booktabs = TRUE, caption = 'A few more advanced global options.')

If you want to install Hugo to a custom path, you can set the global option blogdown.hugo.dir to a directory to store the Hugo executable before you call install_hugo(), e.g., options(blogdown.hugo.dir = '~/Downloads/hugo_0.20.1/'). This may be useful for you to use a specific version of Hugo for a specific website,^[You can set this option per project. See Section \@ref(global-options) for details.] or store a copy of Hugo on a USB Flash drive along with your website.

The option blogdown.method is explained in Section \@ref(methods).

When your website project is under version control in the RStudio IDE, continuously previewing the site can be slow, if it contains hundreds of files or more. The default publish directory is public/ under the project root directory, and whenever you make a change in the source that triggers a rebuild, RStudio will be busy tracking file changes in the public/ directory. The delay before you see the website in the RStudio Viewer can be 10 seconds or even longer. That is why we provide the option blogdown.publishDir. You may set a temporary publish directory to generate the website, and this directory should not be under the same RStudio project, e.g., options(blogdown.publishDir = '../public_site'), which means the website will be generated to the directory public_site/ under the parent directory of the current project.

To understand the option blogdown.new_bundle, you have to know the concept of page bundles in Hugo. When this option is set to TRUE, the function blogdown::new_site() (or the RStudio addin "New Post") will create a new post as the index file of a page bundle, e.g., post/2015-07-23-bye-world/index.md instead of post/2015-07-23-bye-world.md. One benefit of using a page bundle instead of a normal page file is that you can put resource files associated with the post (such as images) under the same directory of the post. This means you no longer have to put them under the static/ directory, which is often confusing to Hugo beginners.

The option blogdown.widgetsID is only relevant if your website source is under version control and you have HTML widgets on the website. If this option is TRUE (default), the random IDs of HTML widgets will be changed to incremental IDs in the HTML output, so these IDs are unlikely to change every time you recompile your website; otherwise, every time you will get different random IDs.

LiveReload

As we briefly mentioned\index{LiveReload} in Section \@ref(a-quick-example), you can use blogdown::serve_site() to preview a website, and the web page will be automatically rebuilt and reloaded in your web browser when the source file is modified and saved. This is called "LiveReload."

We have provided two approaches to LiveReload. The default approach is through servr::httw(), which will continuously watch the website directory for file changes, and rebuild the site when changes are detected. This approach is relatively slow because the website is fully regenerated every time. This may not be a real problem for Hugo, because Hugo is often fast enough: it takes about a millisecond to generate one page, so a website with a thousand pages may only take about one second to be fully regenerated.

If you are not concerned about the above issue, we recommend that you use the default approach, otherwise you can set the global option options(blogdown.generator.server = TRUE) to use an alternative approach to LiveReload, which is based on the native support for LiveReload from the static site generator. At the moment, this has only been tested against Hugo-based websites. It does not work with Jekyll and we were not successful with Hexo, either.

This alternative approach requires an additional R packages to be installed: processx [@R-processx]. You may use this approach when you primarily work on plain Markdown posts instead of R Markdown posts, because it can be much faster to preview Markdown posts using the web server of Hugo. The web server can be stopped by blogdown::stop_server(), and it will always be stopped when the R session is ended, so you can restart your R session if stop_server() fails to stop the server for some reason.

The web server is established via the command hugo server (see its documentation for details). You can pass command-line arguments via the global option blogdown.hugo.server. The default value for this option is c('-D', '-F'), which means to render draft and future posts in the preview. We want to highlight a special argument --navigateToChanged in a recent version of Hugo, which asks Hugo to automatically navigate to the changed page. For example, you can set the options:

options(blogdown.hugo.server = c('-D', '-F', '--navigateToChanged'))

Then, when you edit a source file under content/, Hugo will automatically show you the corresponding output page in the web browser.

Note that Hugo renders and serves the website from memory by default, so no files will be generated to public/. If you need to publish the public/ folder manually, you will have to manually build the website via blogdown::hugo_build() or blogdown::build_site().

Building a website for local preview {#local-preview}

The function\index{blogdown::build_site()} blogdown::build_site() has an argument local that defaults to FALSE, which means building the website for publishing instead of local previewing. The mode local = TRUE is primarily for blogdown::serve_site() to serve the website locally. There are three major differences between local = FALSE and TRUE. When local = TRUE:

You do not need to worry about these details if your website is automatically generated from source via a service like Netlify, which will make use of baseurl and not use -D -F by default. If you manually publish the public/ folder, you need to be more careful:

Functions in the blogdown package {#functions}

There are about 20 exported functions\index{Functions} in the blogdown package, and many more non-exported functions. Exported functions are documented and you can use them after library(blogdown) (or via blogdown::). Non-exported functions are not documented, but you can access them via blogdown::: (the triple-colon syntax). This package is not very complicated, and consists of only about 2000 lines of R code (the number is given by the word-counting command wc):

```{bash, comment=''} wc -l ../R/.R ../inst/scripts/.R

You may take a look at the source code (https://github.com/rstudio/blogdown) if you want to know more about a non-exported function. In this section, we selectively list some exported and non-exported functions in the package for your reference.

### Exported functions

Installation: You can install Hugo with `install_hugo()`, and install a Hugo theme with `install_theme()`.

Wrappers of Hugo commands: `hugo_cmd()` is a general wrapper of `system2('hugo', ...)`, and all later functions execute specific Hugo commands based on this general wrapper function; `hugo_version()` executes the command `hugo version` (i.e., `system2('hugo', 'version')` in R); `hugo_build()` executes `hugo` with optional parameters; `new_site()` executes `hugo new site`; `new_content()` executes `hugo new` to create a new content file, and `new_post()` is a wrapper based on `new_content()` to create a new blog post with appropriate YAML metadata and filename; `hugo_convert()` executes `hugo convert`; `hugo_server()` executes `hugo server`.

Output format: `html_page()` is the only R Markdown output format function in the package. It inherits from `bookdown::html_document2()`, which in turn inherits from `rmarkdown::html_document()`. You need to read the documentation of these two functions to know the possible arguments. Section \@ref(output-format) has more detailed information about it.

Helper functions: `shortcode()` is a helper function to write a Hugo shortcode `{{% %}}` in an Rmd post; `shortcode_html()` writes out `{{< >}}`.

Serving a site: `serve_site()` starts a local web server to build and preview a site continuously; you can stop the server via `stop_server()`, or restart your R session.

Dealing with YAML metadata: `find_yaml()` can be used to find content files that contain a specified YAML field with specified values; `find_tags()` and `find_categories()` are wrapper functions based on `find_yaml()` to match specific tags and categories in content files, respectively; `count_yaml()` can be used to calculate the frequencies of specified fields.

### Non-exported functions

Some functions are not exported in this package because average users are unlikely to use them directly, and we list a subset of them below:

- You can find the path to the Hugo executable via `blogdown:::find_hugo()`. If the executable can be found via the `PATH` environment variable, it just returns `'hugo'`.

- The helper function `modify_yaml()` can be used to modify the YAML metadata of a file. It has a `...` argument that takes arbitrary YAML fields, e.g., `blogdown:::modify_yaml('foo.md', author = 'Frida Gomam', date = '2015-07-23')` will change the `author` field in the file `foo.md` to `Frida Gomam`, and `date` to `2015-07-23`. We have shown the advanced usage of this function in Section \@ref(from-jekyll).

- We have also mentioned a series of functions to clean up Markdown posts in Section \@ref(from-jekyll). They include `remove_extra_empty_lines()`, `process_bare_urls()`, `normalize_chars()`, `remove_highlight_tags()`, and `fix_img_tags()`.

- In Section \@ref(local-preview), we mentioned a caching mechanism based on the file modification time. It is implemented in `blogdown:::require_rebuild()`, which takes two arguments of filenames. The first file is the output file, and the second is the source file. When the source file is older than the output file, or the output file does not exist or is empty, this function returns `TRUE`.

## Paths of figures and other dependencies {#dep-path}

One of the most challenging tasks in developing the **blogdown** package is to properly handle dependency files\index{Dependency Files} of web pages. If all pages of a website were plain text without dependencies like images or JavaScript libraries, it would be much easier for me to develop the **blogdown** package.

After **blogdown** compiles each Rmd document to HTML, it will try to detect the dependencies (if there are any) from the HTML source and copy them to the `static/` folder, so that Hugo will copy them to `public/` later. The detection depends on the paths of dependencies. By default, all dependencies, like R plots and libraries for HTML widgets, are generated to the `foo_files/` directory if the Rmd is named `foo.Rmd`. Specifically, R plots are generated to `foo_files/figure-html/` and the rest of files under `foo_files/` are typically from HTML widgets.

R plots under `content/*/foo_files/figure-html/` are copied to `static/*/foo_files/figure-html/`, and the paths in HTML tags like `<img src="foo_files/figure-html/bar.png" />` are substituted with `/*/foo_files/figure-html/bar.png`. Note the leading slash indicates the root directory of the published website, and the substitution works because Hugo will copy `*/foo_files/figure-html/` from `static/` to `public/`.

Any other files under `foo_files/` are treated as dependency files of HTML widgets, and will be copied to `static/rmarkdown-libs/`. The original paths in HTML will also be substituted accordingly, e.g., from `<script src="foo_files/jquery/jquery.min.js">` to `<script src="/rmarkdown-libs/jquery/jquery.min.js">`. It does not matter whether these files are generated by HTML widgets or not. The links on the published website will be correct and typically hidden from the readers of the pages.^[For example, a reader will not see the `<script>` tag on a page, so it does not really matter what its `src` attribute looks like as long as it is a path that actually exists.]

You should not modify the **knitr** chunk option `fig.path` or `cache.path` unless the above process is completely clear to you, and you want to handle dependencies by yourself.

In the rare cases when **blogdown** fails to detect and copy some of your dependencies (e.g., you used a fairly sophisticated HTML widget package that writes out files to custom paths), you have two possible choices:

- Do not ignore `_files$` in the option `ignoreFiles` in `config.toml`, do not customize the `permalinks` option, and set the option `uglyURLs` to `true`. This way, **blogdown** will not substitute paths that it cannot recognize, and Hugo will copy these files to `public/`. The relative file locations of the `*.html` file and its dependencies will remain the same when they are copied to `public/`, so all links will continue to work.

- If you choose to ignore `_files$` or have customized the `permalinks` option, you need to make sure **blogdown** can recognize the dependencies. One approach is to use the path returned by the helper function `blogdown::dep_path()` to write out additional dependency files. Basically this function returns the current `fig.path` option in **knitr**, which defaults to `*_files/figure-html/`. For example, you can generate a plot manually under `dep_path()`, and **blogdown** will process it automatically (copy the file and substitute the image path accordingly).

If you do not understand all these technical details, we recommend that you use the first choice, and you will have to sacrifice custom permanent links and clean URLs (e.g., `/about.html` instead of `/about/`). With this choice, you can also customize the `fig.path` option for code chunks if you want.

## HTML widgets

We do not recommend that you use different HTML widgets\index{HTML Widgets} from many R packages on the same page, because it is likely to cause conflicts in JavaScript. For example, if your theme uses the jQuery library, it may conflict with the jQuery library used by a certain HTML widget. In this case, you can conditionally load the theme's jQuery\index{jQuery} library by setting a parameter in the YAML metadata of your post and revising the Hugo template that loads jQuery. Below is the example code to load jQuery conditionally in a Hugo template:

```html
{{ if not .Params.exclude_jquery}}
<script src="path/to/jquery.js"></script>
{{ end }}

Then if you set exclude_jquery: true in the YAML metadata of a post, the theme's jQuery will not be loaded, so there will not be conflicts when your HTML widgets also depend on jQuery.

Another solution is the widgetframe package [@R-widgetframe]. It solves this problem by embedding HTML widgets in <iframe></iframe>. Since an iframe is isolated from the main web page on which it is embedded, there will not be any JavaScript conflicts.

A widget is typically not of full width on the page. To set its width to 100%, you can use the chunk option out.width = "100%".

Version control

If your website source files are under version control\index{Version Control}, we recommend that you add at least these two folder names to your .gitignore file^[In the pattern use, the first / means to not ignore recursively but only from the current directory of the .gitignore. See more about .gitignore pattern in https://git-scm.com/book/en/v2/Git-Basics-Recording-Changes-to-the-Repository#_ignoring]:

/blogdown/
/public/

The blogdown/ directory is used to store cache files, and they are most likely to be useless to the published website. Only knitr may use them, and the published website will not depend on these files.

The public/ directory should be ignored if your website is going to be automatically (re)built on a remote server such as Netlify.

As we mentioned in Section \@ref(dep-path), R plots will be copied to static/, so you may see new files in GIT after you render an Rmd file that has graphics output. You need to add and commit these new files in GIT, because the website will use them.

Although it is not relevant to blogdown, macOS users should remember to ignore .DS_Store and Windows users should ignore Thumbs.db.

If you are relatively familiar with GIT\index{GIT Submodules}, there is a special technique that may be useful for you to manage Hugo themes, which is called "GIT submodules." A submodule in GIT allows you to manage a particular folder of the main repository using a different remote repository. For example, if you used the default hugo-lithium from my GitHub repository, you might want to sync it with my repository occasionally, because I may update it from time to time. You can add the GIT submodule via the command line:

git submodule add \
  https://github.com/yihui/hugo-lithium.git \
  themes/hugo-lithium

If the folder themes/hugo-lithium exists, you need to delete it before adding the submodule. Then you can see a SHA string associated with the "folder" themes/hugo-lithium in the GIT status of your main repository indicating the version of the submodule. Note that you will only see the SHA string instead of the full content of the folder. Next time when you want to sync with my repository, you may run the command:

git submodule update --recursive --remote

In general, if you are happy with how your website looks, you do not need to manage the theme using GIT submodules. Future updates in the upstream repository may not really be what you want. In that case, a physical and fixed copy of the theme is more appropriate for you.

The default HTML template {#default-template}

As we mentioned in Section \@ref(output-format), the default output format for an Rmd document in blogdown is blogdown::html_page. This format passes a minimal HTML template to Pandoc\index{Pandoc} by default:


You can find this template file via blogdown:::pkg_file('resources', 'template-minimal.html') in R, and this file path is the default value of the template argument of html_page(). You may change this default template, but you should understand what this template is supposed to do first.

If you are familiar with Pandoc templates, you should realize that this is not a complete HTML template, e.g., it does not have the tags <html>, <head>, or <body>. That is because we do not need or want Pandoc to return a full HTML document to us. The main thing we want Pandoc to do is to convert our Markdown document to HTML, and give us the body of the HTML document, which is in the template variable $body$. Once we have the body, we can further pass it to Hugo, and Hugo will use its own template to embed the body and generate the full HTML document. Let's explain this by a minimal example. Suppose we have an R Markdown document foo.Rmd:

---
title: "Hello World"
author: "Yihui Xie"
---

I found a package named **blogdown**.

It is first converted to an HTML file foo.html through html_page(), and note that YAML metadata are ignored for now:

I found a package named <strong>blogdown</strong>.

Then blogdown will read the YAML metadata of the Rmd source file, and insert the metadata into the HTML file so it becomes:

---
title: "Hello World"
author: "Yihui Xie"
---

I found a package named <strong>blogdown</strong>.

This is the file to be picked up by Hugo and eventually converted to an HTML page of a website. Since the Markdown body has been processed to HTML by Pandoc, Hugo will basically use the HTML. That is how we bypass Hugo's Markdown engine BlackFriday.

Besides $body$, you may have noticed other Pandoc template variables like $header-includes$, $css$, $include-before$, $toc$, and $include-after$. These variables make it possible to customize the html_page format. For example, if you want to generate a table of contents, and apply an additional CSS stylesheet to a certain page, you may set toc to true and pass the stylesheet path to the css argument of html_page(), e.g.,

---
title: "Hello World"
author: "Yihui Xie"
output:
  blogdown::html_page:
    toc: true
    css: "/css/my-style.css"
---

Different building methods {#methods}

If your website does not contain any Rmd files, it is very straightforward to render it --- just a system call to the hugo command. When your website contains Rmd files, blogdown has provided two rendering methods to compile these Rmd files. A website can be built using the function blogdown::build_site():


As mentioned in Section \@ref(global-options), the default value of the method argument is determined by the global option blogdown.method, and you can set this option in .Rprofile.

For method = 'html', build_site() renders *.Rmd to *.html, and *.Rmarkdown to *.markdown, and keeps the *.html/*.markdown output files under the same directory as *.Rmd/*.Rmarkdown files.

An Rmd file may generate two directories for figures (*_files/) and cache (*_cache/), respectively, if you have plot output or HTML widgets [@R-htmlwidgets] in your R code chunks, or enabled the chunk option cache = TRUE for caching. In the figure directory, there will be a subdirectory figure-html/ that contains your plot output files, and possibly other subdirectories containing HTML dependencies from HTML widgets (e.g., jquery/). The figure directory is moved to /static/, and the cache directory is moved to /blogdown/.

After you run build_site(), your website is ready to be compiled by Hugo. This gives you the freedom to use deploying services like Netlify (Chapter \@ref(deployment)), where neither R nor blogdown is available, but Hugo is.

For method = 'custom', build_site() will not process any R Markdown files, nor will it call Hugo to build the site. No matter which method you choose to use, build_site() will always look for an R script /R/build.R and execute it if it exists. This gives you the complete freedom to do anything you want for the website. For example, you can call knitr::knit() to compile Rmd to Markdown (*.md) in this R script instead of using rmarkdown::render(). This feature is designed for advanced users who are really familiar with the knitr package^[Honestly, it was originally designed for Yihui himself to build his own website, but he realized this feature could actually free users from Hugo. For example, it is possible to use Jekyll (another popular static site generator) with blogdown, too.] and Hugo or other static website generators (see Chapter \@ref(other-generators)).

When R/build.R exists and method = 'html', the R Markdown files are knitted first, then the script R/build.R is executed, and lastly Hugo is called to build the website.



rstudio/blogdown documentation built on Feb. 5, 2024, 10:09 p.m.