In rstudio/blogdown: Create Blogs and Websites with R Markdown

Website Basics

If you want to tweak the appearance of your website, or even design your own theme, you must have some basic knowledge of web development. In this appendix, we briefly introduce HTML, CSS, and JavaScript, which are the most common components of a web page, although CSS and JavaScript are optional.

We only aim at getting you started with HTML, CSS, and JavaScript. HTML is relatively simple to learn, but CSS and JavaScript can be much more complicated, depending on how much you want to learn and what you want to do with them. After reading this appendix, you will have to use other resources to teach yourself. When you search for these technologies online, the most likely websites that you may hit are MDN (Mozilla Developer Network), w3schools.com, and StackOverflow. Among these websites, w3schools often provides simple examples and tutorials that may be friendlier to beginners, but we often hear negative comments about it, so please use it with caution. I often read all three websites when looking for solutions.

If we were only allowed to give one single most useful tip about web development, it would be: use the Developer Tools of your web browser. Most modern web browsers provide these tools. For example, you can find these tools from the menu of Google Chrome View -> Developer, Firefox's menu Tools -> Web Developer, or Safari's menu Develop -> Show Web Inspector. Figure \@ref(fig:chrome-devtools) is a screenshot of using the Developer Tools in Chrome.

knitr::include_graphics('images/chrome-devtools.png')

Typically you can also open the Developer Tools by right-clicking on a certain element on the web page and selecting the menu item Inspect (or Inspect Element). In Figure \@ref(fig:chrome-devtools), I right-clicked on the profile image of my website https://yihui.org and inspected it, and Chrome highlighted its HTML source code <img src="..." ... /> in the left pane. You can also see the CSS styles associated with this img element in the right pane. What is more, you can interactively change the styles there if you know CSS, and see the (temporary) effects in real time on the page! After you are satisfied with the new styles, you can write the CSS code in files.

There are a lot of amazing features of Developer Tools, which make them not only extremely useful for debugging and experimentation, but also helpful for learning web development. These tools are indispensable to me when I develop anything related to web pages. I learned a great deal about CSS and JavaScript by playing with these tools.

HTML

HTML\index{HTML} stands for Hyper Text Markup Language, and it is the language behind most web pages you see. You can use the menu View -> View Source or the context menu View Page Source to see the full HTML source of a web page in your browser. All elements on a page are represented by HTML tags. For example, the tag <p> represents paragraphs, and <img> represents images.

The good thing about HTML is that the language has only a limited number of tags, and the number is not very big (especially the number of commonly used tags). This means there is hope that you can master this language fully and quickly.

Most HTML tags appear in pairs, with an opening tag and a closing tag, e.g., <p></p>. You write the content between the opening and closing tags, e.g., <p>This is a paragraph.</p>. There are a few exceptions, such as the <img> tag, which can be closed by a slash / in the opening tag, e.g., <img src="foo.png" />. You can specify attributes of an element in the opening tag using the syntax name=value (a few attributes do not require value).

HTML documents often have the filename extension .html (or .htm). Below is an overall structure of an HTML document:

<html>

  <head>
  </head>

  <body>
  </body>

</html>

Basically an HTML document consists a head section and body section. You can specify the metadata and include assets like CSS files in the head section. Normally the head section is not visible on a web page. It is the body section that holds the content to be displayed to a reader. Below is a slightly richer example document:

<!DOCTYPE html>
<html>

  <head>
    <meta charset="utf-8" />

    <title>Your Page Title</title>

    <link rel="stylesheet" href="/css/style.css" />
  </head>

  <body>
    <h1>A First-level Heading</h1>

    <p>A paragraph.</p>

    <img src="/images/foo.png" alt="A nice image" />

    <ul>
      <li>An item.</li>
      <li>Another item.</li>
      <li>Yet another item.</li>
    </ul>

    <script src="/js/bar.js"></script>
  </body>

</html>

In the head, we declare the character encoding of this page to be UTF-8 via a <meta> tag, specify the title via the <title> tag, and include a stylesheet via a <link> tag.

The body contains a first-level section heading <h1>,^[There are six possible levels from h1, h2, ..., to h6.] a paragraph <p>, an image <img>, an unordered list <ul> with three list items <li>, and includes a JavaScript file in the end via <script>.

There are much better tutorials on HTML than this section, such as those offered by MDN and w3schools.com, so we are not going to make this section a full tutorial. Instead, we just want to provide a few tips on HTML:

You may validate your HTML code via this service: https://validator.w3.org. This validator will point out potential problems of your HTML code. It actually works for XML and SVG documents, too.
Among all HTML attributes, file paths (the src attribute of some tags like <img>) and links (the href attribute of the <a> tag) may be the most confusing to beginners. Paths and links can be relative or absolute, and may come with or without the protocol and domain. You have to understand what a link or path exactly points to. A full link is of the form http://www.example.com/foo/bar.ext, where http specifies the protocol (it can be other protocols like https or ftp), www.example.com is the domain, and foo/bar.ext is the file under the root directory of the website.
- If you refer to resources on the same website (the same protocol and domain), we recommend that you omit the protocol and domain names, so that the links will continue to work even if you change the protocol or domain. For example, a link <a href="/hi/there.html"> on a page http://example.com/foo/ refers to http://example.com/hi/there.html. It does not matter if you change http to https, or example.com to another-domain.com.
- Within the same website, a link or path can be relative or absolute. The meaning of an absolute path does not change no matter where the current HTML file is placed, but the meaning of a relative path depends on the location of the current HTML file. Suppose you are currently viewing the page example.com/hi/there.html:
  - A absolute path /foo/bar.ext always means example.com/foo/bar.ext. The leading slash means the root directory of the website.
  - A relative path ../images/foo.png means example.com/images/foo.png (.. means to go one level up). However, if the HTML file there.html is moved to example.com/hey/again/there.html, this path in there.html will refer to example.com/hey/images/foo.png.
  - When deciding whether to use relative or absolute paths, here is the rule of thumb: if you will not move the resources referred or linked to from one subpath to another (e.g., from example.com/foo/ to example.com/bar/), but only move the HTML pages that use these resources, use absolute paths; if you want to change the subpath of the URL of your website, but the relative locations of HTML files and the resources they use do not change, you may use relative links (e.g., you can move the whole website from example.com/ to example.com/foo/).
  - If the above concepts sound too complicated, a better way is to either think ahead carefully about the structure of your website and avoid moving files, or use rules of redirects if supported (such as 301 or 302 redirects).
- If you link to a different website or web page, you have to include the domain in the link, but it may not be necessary to include the protocol, e.g., //test.example.com/foo.css is a valid path. The actual protocol of this path matches the protocol of the current page, e.g., if the current page is https://example.com/, this link means https://test.example.com/foo.css. It may be beneficial to omit the protocol because HTTP resources cannot be embedded on pages served through HTTPS (for security reasons), e.g., an image at http://example.com/foo.png cannot be embedded on a page https://example.com/hi.html via <img src="http://example.com/foo.png" />, but <img src="//example.com/foo.png" /> will work if the image can be accessed via HTTPS, i.e., https://example.com/foo.png. The main drawback of not including the protocol is that such links and paths do not work if you open the HTML file locally without using a web server, e.g., only double-click the HTML file in your file browser and show it in the browser.^[That is because without a web server, an HTML file is viewed via the protocol file. For example, you may see a URL of the form file://path/to/the/file.html in the address bar of your browser. The path //example.com/foo.png will be interpreted as file://example.com/foo.png, which is unlikely to exist as a local file on your computer.]
- A very common mistake that people make is a link without the leading double slashes before the domain. You may think www.example.com is a valid link. It is not! At least it does not link to the website that you intend to link to. It works when you type it in the address bar of your browser because your browser will normally autocomplete it to http://www.example.com. However, if you write a link <a href="www.example.com">See this link</a>, you will be in trouble. The browser will interpret this as a relative link, and it is relative to the URL of the current web page, e.g., if you are currently viewing http://yihui.org/cn/, the link www.example.com actually means http://yihui.org/cn/www.example.com! Now you should know the Markdown text [Link](www.example.com) is typically a mistake, unless you really mean to link to a subdirectory of the current page or a file with literally the name www.example.com.

CSS

The Cascading Stylesheets\index{CSS} (CSS) language is used to describe the look and formatting of documents written in HTML. CSS is responsible for the visual style of your site. CSS is a lot of fun to play with, but it can also easily steal your time.

On Hugo websites and in Hugo themes, CSS is one of the major "non-content" files that shapes the look and feel of your site (the others are images, JavaScript, and Hugo templates). What does the "look and feel" of a site mean? "Look" generally refers to static style components including, but not limited to

color palettes,
images,
layouts/margins, and
fonts.

whereas "feel" relates to dynamic components that the user interacts with like

dropdown menus,
buttons, and
forms.

There are 3 ways to define styles. The first is in line with HTML. For example, this code

<p>Marco! <em>Polo!</em></p>

would produce text that looks like this: Marco! Polo!

However, this method is generally not preferred for numerous reasons.

A second way is to internally define the CSS by placing a style section in your HTML:

<html>
<style> 
#favorite {
    font-style: italic;
}
</style>
<ul id="tea-list">
  <li>Earl Grey</li>
  <li>Darjeeling</li>
  <li>Oolong</li>
  <li>Chamomile</li>
  <li id="favorite">Chai</li>
</ul>
</html>

In this example, only the last tea listed, Chai, is italicized.

The third method is the most popular because it is more flexible and the least repetitive. In this method, you define the CSS in an external file that is then referenced as a link in your HTML:

<html>
    <link rel="stylesheet" href="/css/style.css" />
</html>

What goes inside the linked CSS document is essentially a list of rules (the same list could appear internally between the style tags, if you are using the second method). Each rule must include both a selector or group of selectors, and a declarations block within curly braces that contains one or more property: value; pairs. Here is the general structure for a rule:

/* CSS pseudo-code */
selectorlist {
    property: value;
    /* more property: value; pairs*/
}

Selectors can be based on HTML element types or attributes, such as id or class (and combinations of these attributes):

/* by element type */
li { 
    color: yellow; /* all <li> elements are yellow */
}

/* by ID with the # symbol */
#my-id { 
    color: yellow; /* elements with id = "my-id" are yellow */
}

/* by class with the . symbol */
.my-class { 
    color: yellow; /* elements with class = "my-class" are yellow  */
}

Because each HTML element may match several different selectors, the CSS standard determines which set of rules has precedence for any given element, and which properties to inherit. This is where the cascade algorithm comes into play. For example, take a simple unordered list:

<ul id="tea-list">
  <li>Earl Grey</li>
  <li>Darjeeling</li>
  <li>Oolong</li>
  <li>Chamomile</li>
  <li>Chai</li>
</ul>

Now, let's say we want to highlight our favorite teas again, so we'll use a class attribute.

<ul id="tea-list">
  <li>Earl Grey</li>
  <li class="favorite">Darjeeling</li>
  <li>Oolong</li>
  <li>Chamomile</li>
  <li class="favorite">Chai</li>
</ul>

We can use this class attribute as a selector in our CSS. Let's say we want our favorite teas to be in bold and have a background color of yellow, so our CSS would look like this:

.favorite {
  font-weight: bold;
  background-color: yellow;
}

Now, if you want every list item to be italicized with a white background, you can set up another rule:

li { 
  font-style: italic;
  background-color: white;
}

If you play with this code (which you can do easily using sites like http://jsbin.com, https://jsfiddle.net, or https://codepen.io/pen/), you'll see that the two favorite teas are still highlighted in yellow. This is because the CSS rule about .favorite as a class is more specific than the rule about li type elements. To override the .favorite rule, you need to be as specific as you can be when choosing your group of selectors:

ul#tea-list li.favorite {
  background-color: white;
}

This example just scratches the surface of cascade and inheritance.

For any Hugo theme that you install, you can find the CSS file in the themes/ folder. For example, the default lithium theme is located in: themes/hugo-lithium/static/css/main.css. Once you are familiar with CSS, you can understand how each set of rules work to shape the visual style of your website, and how to alter the rules. For some themes (i.e., the hugo-anatole theme), you have the option of linking to a custom CSS, which you can use to further customize the visual style of your site.

A few one-line examples illustrate how simple CSS rules can be used to make dramatic changes:

To make circular or rounded images, you may assign a class img-circle to images (e.g., <img class="img-circle" src="foo.png" />) and define the CSS:

css .img-circle { border-radius: 50%; }
To make striped tables, you can add background colors to odd or even rows of the table, e.g.,

css tr:nth-child(even) { background: #eee; }
You can append or prepend content to elements via pseudo-elements ::after and ::before. Here is an example of adding a period after section numbers: https://github.com/rstudio/blogdown/issues/80.

JavaScript

It is way more challenging to briefly introduce JavaScript\index{JavaScript} than HTML and CSS, since it is a programming language. There are many books and tutorials about this language. Anyway, we will try to scratch the surface for R users in this section.

In a nutshell, JavaScript is a language that is typically used to manipulate elements on a web page. An effective way to learn it is through the JavaScript console in the Developer Tools of your web browser (see Figure \@ref(fig:chrome-devtools)), because you can interactively type code in the console and execute it, which feels similar to executing R code in the R console (e.g., in RStudio). You may open any web page in your web browser (e.g., https://yihui.org), then open the JavaScript console, and try the code below on any web page:

document.body.style.background = 'orange';

It should change the background color of the page to orange, unless the page has already defined background colors for certain elements.

To effectively use JavaScript, you have to learn both the basic syntax of JavaScript and how to select elements on a page before you can manipulate them. You may partially learn the former from the short JavaScript snippet below:

var x = 1;  // assignments
1 + 2 - 3 * 4 / 5;  // arithmetic

if (x < 2) console.log(x);  // "print" x

var y = [9, 1, 0, 2, 1, 4];  // array

// function
var sum = function(x) {
  var s = 0;
  // a naive way to compute the sum
  for (var i=0; i < x.length; i++) {
    s += x[i];
  }
  return s;
};

sum(y);

var y = "Hello World";
y = y.replace(" ", ", ");  // string manipulation

You may feel the syntax is similar to R to some degree. JavaScript is an object-oriented language, and usually there are several methods that you can apply to an object. The string manipulation above is a typical example of the Object.method() syntax. To know the possible methods on an object, you can type the object name in your JavaScript console followed by a dot, and you should see all candidates.

R users have to be extremely cautious because JavaScript objects are often mutable, meaning that an object could be modified anywhere. Below is a quick example:

var x = {"a": 1, "b": 2};  // like a list in R
var f = function(z) {
  z.a = 100;
};
f(x);
x;  // modified! x.a is 100 now

There are many mature JavaScript libraries that can help you select and manipulate elements on a page, and the most popular one may be\index{jQuery} jQuery. However, you should know that sometimes you can probably do well enough without these third-party libraries. There are some basic methods to select elements, such as document.getElementById() and document.getElementsByClassName(). For example, you can select all paragraphs using document.querySelectorAll('p').

Next we show a slightly advanced application, in which you will see anonymous functions, selection of elements by HTML tag names, regular expressions, and manipulation of HTML elements.

In Section \@ref(how-to), we mentioned how to enable MathJax\index{MathJax} on a Hugo website. The easy part is to include the script MathJax.js via a <script> tag, and there are two hard parts:

How to protect the math content from the Markdown engine (Blackfriday), e.g., we need to make sure underscores in math expressions are not interpreted as <em></em>. This problem only exists in plain Markdown posts, and has been mentioned in Section \@ref(output-format) without explaining the solution.
By default, MathJax does not recognize a pair of single dollar signs as the syntax for inline math expressions, but most users are perhaps more comfortable with the syntax $x$ than $x$.

The easiest solution to the first problem may be adding backticks around math expressions, e.g., `$x_i$`, but the consequence is that the math expression will be rendered in <code></code>, and MathJax ignores <code> tags when looking for math expressions on the page. We can force MathJax to search for math expressions in <code>, but this will still be problematic. For example, someone may want to display inline R code `list$x$y`, and $x$ may be recognized as a math expression. MathJax ignores <code> for good reasons. Even if you do not have such expressions in <code>, you may have some special CSS styles attached to <code>, and these styles will be applied to your math expressions, which can be undesired (e.g., a light gray background).

To solve these problems, I have provided a solution in the JavaScript code at https://yihui.org/js/math-code.js:

(function() {
  var i, text, code, codes = document.getElementsByTagName('code');
  for (i = 0; i < codes.length;) {
    code = codes[i];
    if (code.parentNode.tagName !== 'PRE' &&
        code.childElementCount === 0) {
      text = code.textContent;
      if (/^\$[^$]/.test(text) && /[^$]\$$/.test(text)) {
        text = text.replace(/^\$/, '\\(').replace(/\$$/, '\\)');
        code.textContent = text;
      }
      if (/^\\\((.|\s)+\\\)$/.test(text) ||
          /^\\\[(.|\s)+\\\]$/.test(text) ||
          /^\$(.|\s)+\$$/.test(text) ||
          /^\\begin\{([^}]+)\}(.|\s)+\\end\{[^}]+\}$/.test(text)) {
        code.outerHTML = code.innerHTML;  // remove <code></code>
        continue;
      }
    }
    i++;
  }
})();

It is not a perfect solution, but it should be very rare that you run into problems. This solution identifies possible math expressions in <code>, and strips the <code> tag, e.g., replaces <code>$x$</code> with $x$. After this script is executed, we load the MathJax script. This way, we do not need to force MathJax to search for math expressions in <code> tags, and your math expressions will not inherit any styles from <code>. The JavaScript code above is not too long, and should be self-explanatory. The trickiest part is i++. I will leave it to readers to figure out why the for loop is not the usual form for (i = 0; i < codes.length; i++). It took me quite a few minutes to realize my mistake when I wrote the loop in the usual form.

Useful resources

File optimization

Although\index{Optimization} static websites are fast in general, you can certainly further optimize them. You may search for "CSS and JavaScript minifier," and these tools can compress your CSS and JavaScript files, so that they can be loaded faster. Since there are a lot of tools, I will not recommend any here.

You can also optimize images on your website. I frequently use a command-line tool named optipng to optimize PNG images. It is a lossless optimizer, meaning that it reduces the file size of a PNG image without loss of quality. From my experience, it works very well on PNG images generated from R, and can reduce the file size by at least 30% (sometimes even more than 50%). Personally I also use online tools like https://imagecompressor.com to optimize PNG and JPEG images. For GIF images, I often use https://ezgif.com/optimize to reduce the file sizes if they are too big.

Note that Netlify has provided the optimization features on the server side for free at the moment, so you may just want to enable them there instead of doing all the hard work by yourself.

If you prefer optimizing images using R packages instead of online tools, you may consider using the functions xfun::tinify() and xfun::shrink_images(). The former compresses PNG/JPEG/WebP images using TinyPNG (https://tinypng.com), and the latter shrinks images to a maximum width (800px by default) using the magick package. If you want to use the TinyPNG service, you will need an API key, and I also recommend that you set a global option in your .Rprofile to avoid unnecessary API calls to TinyPNG:

# create a tinify history file if it does not exist
dir.create('blogdown', showWarnings = FALSE)
file.create('blogdown/tinify.txt')
# set the global option
options(
  xfun.tinify.history = normalizePath('blogdown/tinify.txt')
)

See the help page ?xfun::tinify for more information. If you want to both compress images and shrink them, you may call xfun::shrink_images(tinify = TRUE).

Helping people find your site

Once your site is up and running, you probably want people to find it. SEO --- Search Engine Optimization --- is the art of making a website easy for search engines like Google to understand. And, hopefully, if the search engine knows what you are writing about, it will present links to your site high in results when someone searches for topics you cover.

Entire books have been written about SEO, not to mention the many companies that are in the business of offering (paid) technical and strategic advice to help get sites atop search-engine rankings. If you are interested in finding out more, a good place to start is the Google Search Engine Optimization Starter Guide (https://developers.google.com/search/docs/beginner/seo-starter-guide). Below are a few key points:

The title that you select for each page and post is a very important signal to Google and other search engines telling them what that page is about.
Description tags are also critical to explain what a page is about. In HTML documents, description tags are one way to provide metadata about the page. Using blogdown, the description may end up as text under the page title in a search-engine result. If your page's YAML does not include a description, you can add one like description: "A brief description of this page."; the HTML source of the rendered page would have a <meta> tag in <head> like <meta name="description" content="A brief description of this page.">. Not all themes support adding the description to your HTML page (although they should!)
URL structure also matters. You want your post's slug to have informative keywords, which gives another signal of what the page is about. Have a post with interesting things to do in San Francisco? san-francisco-events-calendar might be a better slug than my-guide-to-fun-things-to-do.