Output Formats

The bookdown package primarily supports three types of output formats: HTML, LaTeX/PDF, and e-books. In this chapter, we introduce the possible options for these formats. Output formats can be specified either in the YAML metadata of the first Rmd file of the book, or in a separate YAML file named _output.yml under the root directory of the book. Here is a brief example of the former (output formats are specified in the output field of the YAML metadata):

---
title: "An Impressive Book"
author: "Li Lei and Han Meimei"
output:
  bookdown::gitbook:
    lib_dir: assets
    split_by: section
    config:
      toolbar:
        position: static
  bookdown::pdf_book:
    keep_tex: yes
  bookdown::html_book:
    css: toc.css
documentclass: book
---

Here is an example of _output.yml\index{_output.yml}:

bookdown::gitbook:
  lib_dir: assets
  split_by: section
  config:
    toolbar:
      position: static
bookdown::pdf_book:
  keep_tex: yes
bookdown::html_book:
  css: toc.css

In this case, all formats should be at the top level, instead of under an output field. You do not need the three dashes --- in _output.yml.

HTML

The main difference between rendering a book (using bookdown) with rendering a single R Markdown document (using rmarkdown) to HTML\index{HTML} is that a book will generate multiple HTML pages by default --- normally one HTML file per chapter. This makes it easier to bookmark a certain chapter or share its URL with others as you read the book, and faster to load a book into the web browser. Currently we have provided a number of different styles for HTML output: the GitBook style, the Bootstrap style, and the Tufte style.

GitBook style {#gitbook-style}

The GitBook style was borrowed from GitBook\index{GitBook}, a project launched by Friendcode, Inc. (https://www.gitbook.com) and dedicated to helping authors write books with Markdown. It provides a beautiful style, with a layout consisting of a sidebar showing the table of contents on the left, and the main body of a book on the right. The design is responsive to the window size, e.g., the navigation buttons are displayed on the left/right of the book body when the window is wide enough, and collapsed into the bottom when the window is narrow to give readers more horizontal space to read the book body.

We have made several improvements over the original GitBook project. The most significant one is that we replaced the Markdown engine with R Markdown v2 based on Pandoc, so that there are a lot more features for you to use when writing a book:

We have also added some useful features in the user interface that we will introduce in detail soon. The output format function for the GitBook style in bookdown is gitbook(). Here are its arguments:


Most arguments are passed to rmarkdown::html_document(), including fig_caption, lib_dir, and .... You can check out the help page of rmarkdown::html_document() for the full list of possible options. We strongly recommend you to use fig_caption = TRUE for two reasons: 1) it is important to explain your figures with captions; 2) enabling figure captions means figures will be placed in floating environments when the output is LaTeX, otherwise you may end up with a lot of white space on certain pages. The format of figure/table numbers depends on if sections are numbered or not: if number_sections = TRUE, these numbers will be of the format X.i, where X is the chapter number, and i in an incremental number; if sections are not numbered, all figures/tables will be numbered sequentially through the book from 1, 2, ..., N. Note that in either case, figures and tables will be numbered independently.

Among all possible arguments in ..., you are most likely to use the css argument to provide one or more custom CSS files to tweak the default CSS style. There are a few arguments of html_document() that have been hard-coded in gitbook() and you cannot change them: toc = TRUE (there must be a table of contents), theme = NULL (not using any Bootstrap themes), and template (there exists an internal GitBook template).

Please note that if you change self_contained = TRUE to make self-contained HTML pages, the total size of all HTML files can be significantly increased since there are many JS and CSS files that have to be embedded in every single HTML file.

Besides these html_document() options, gitbook() has three other arguments: split_by, split_bib, and config. The split_by argument specifies how you want to split the HTML output into multiple pages, and its possible values are:

For chapter and section, the HTML filenames will be determined by the header identifiers, e.g., the filename for the first chapter with a chapter title # Introduction will be introduction.html by default. For chapter+number and section+number, the chapter/section numbers will be prepended to the HTML filenames, e.g., 1-introduction.html and 2-1-literature.html. The header identifier is automatically generated from the header text by default,^[To see more details on how an identifier is automatically generated, see the auto_identifiers extension in Pandoc's documentation http://pandoc.org/MANUAL.html#header-identifiers] and you can manually specify an identifier using the syntax {#your-custom-id} after the header text, e.g.,

# An Introduction {#introduction}

The default identifier is `an-introduction` but we changed
it to `introduction`.

By default, the bibliography is split and relevant citation items are put at the bottom of each page, so that readers do not have to navigate to a different bibliography page to see the details of citations. This feature can be disabled using split_bib = FALSE, in which case all citations are put on a separate page.

There are several sub-options in the config option for you to tweak some details in the user interface. Recall that all output format options (not only for bookdown::gitbook) can be either passed to the format function if you use the command-line interface bookdown::render_book(), or written in the YAML metadata. We display the default sub-options of config in the gitbook format as YAML metadata below (note that they are indented under the config option):

bookdown::gitbook:
  config:
    toc:
      collapse: subsection
      scroll_highlight: yes
      before: null
      after: null
    toolbar:
      position: fixed
    edit : null
    download: null
    search: yes
    fontsettings:
      theme: white
      family: sans
      size: 2
    sharing:
      facebook: yes
      twitter: yes
      google: no
      weibo: no
      instapper: no
      vk: no
      all: ['facebook', 'google', 'twitter', 'weibo', 'instapaper']

The toc option controls the behavior of the table of contents (TOC). You can collapse some items initially when a page is loaded via the collapse option. Its possible values are subsection, section, none (or null). This option can be helpful if your TOC is very long and has more than three levels of headings: subsection means collapsing all TOC items for subsections (X.X.X), section means those items for sections (X.X) so only the top-level headings are displayed initially, and none means not collapsing any items in the TOC. For those collapsed TOC items, you can toggle their visibility by clicking their parent TOC items. For example, you can click a chapter title in the TOC to show/hide its sections.

The scroll_highlight option in toc indicates whether to enable highlighting of TOC items as you scroll the book body (by default this feature is enabled). Whenever a new header comes into the current viewport as you scroll down/up, the corresponding item in TOC on the left will be highlighted.

Since the sidebar has a fixed width, when an item in the TOC is truncated because the heading text is too wide, you can hover the cursor over it to see a tooltip showing the full text.

You may add more items before and after the TOC using the HTML tag <li>. These items will be separated from the TOC using a horizontal divider. You can use the pipe character | so that you do not need to escape any characters in these items following the YAML syntax, e.g.,

    toc:
      before: |
        <li><a href="...">My Awesome Book</a></li>
        <li><a href="...">John Smith</a></li>
      after: |
        <li><a href="https://github.com/rstudio/bookdown">
        Proudly published with bookdown</a></li>

As you navigate through different HTML pages, we will try to preserve the scroll position of the TOC. Normally you will see the scrollbar in the TOC at a fixed position even if you navigate to the next page. However, if the TOC item for the current chapter/section is not visible when the page is loaded, we will automatically scroll the TOC to make it visible to you.

knitr::include_graphics('images/gitbook.png', dpi = NA)

The GitBook style has a toolbar (Figure \@ref(fig:gitbook-toolbar)) at the top of each page that allows you to dynamically change the book settings. The toolbar option has a sub-option position, which can take values fixed or static. The default is that the toolbar will be fixed at the top of the page, so even if you scroll down the page, the toolbar is still visible there. If it is static, the toolbar will not scroll with the page, i.e., once you scroll away, you will no longer see it.

The first button on the toolbar can toggle the visibility of the sidebar. You can also hit the S key on your keyboard to do the same thing. The GitBook style can remember the visibility status of the sidebar, e.g., if you closed the sidebar, it will remain closed the next time you open the book. In fact, the GitBook style remembers many other settings as well, such as the search keyword and the font settings.

The second button on the toolbar is the search button. Its keyboard shortcut is F (Find). When the button is clicked, you will see a search box at the top of the sidebar. As you type in the box, the TOC will be filtered to display the sections that match the search keyword. Now you can use the arrow keys Up/Down to highlight the next keyword on the current page. When you click the search button again (or hit F outside the search box), the search keyword will be emptied and the search box will be hidden. To disable searching, set the option search: no in config.

The third button is for font/theme settings. You can change the font size (bigger or smaller), the font family (serif or sans serif), and the theme (White, Sepia, or Night). These settings can be changed via the fontsettings option.

The edit option is the same as the option mentioned in Section \@ref(configuration). If it is not empty, an edit button will be added to the toolbar. This was designed for potential contributors to the book to contribute by editing the book on GitHub after clicking the button and sending pull requests.

If your book has other output formats for readers to download, you may provide the download option so that a download button can be added to the toolbar. This option takes either a character vector, or a list of character vectors with the length of each vector being 2. When it is a character vector, it should be either a vector of filenames, or filename extensions, e.g., both of the following settings are okay:

    download: ["book.pdf", "book.epub"]
    download: ["pdf", "epub", "mobi"]

When you only provide the filename extensions, the filename is derived from the book filename of the configuration file _bookdown.yml (Section \@ref(configuration)). When download is null, gitbook() will look for PDF, EPUB, and MOBI files in the book output directory, and automatically add them to the download option. If you just want to suppress the download button, use download: no. All files for readers to download will be displayed in a drop-down menu, and the filename extensions are used as the menu text. When the only available format for readers to download is PDF, the download button will be a single PDF button instead of a drop-down menu.

An alternative form for the value of the download option is a list of length-2 vectors, e.g.,

    download: [["book.pdf", "PDF"], ["book.epub", "EPUB"]]

You can also write it as:

    download:
      - ["book.pdf", "PDF"]
      - ["book.epub", "EPUB"]

Each vector in the list consists of the filename and the text to be displayed in the menu. Compared to the first form, this form allows you to customize the menu text, e.g., you may have two different copies of the PDF for readers to download and you will need to make the menu items different.

On the right of the toolbar, there are some buttons to share the link on social network websites such as Twitter, Facebook, and Google+. You can use the sharing option to decide which buttons to enable. If you want to get rid of these buttons entirely, use sharing: null (or no).

Finally, there are a few more top-level options in the YAML metadata that can be passed to the GitBook HTML template via Pandoc. They may not have clear visible effects on the HTML output, but they may be useful when you deploy the HTML output as a website. These options include:

Below we show some sample YAML metadata (again, please note that these are top-level options):

---
title: "An Awesome Book"
author: "John Smith"
description: "This book introduces the ABC theory, and ..."
url: "https\://bookdown.org/john/awesome/"
github-repo: "john/awesome"
cover-image: "images/cover.png"
apple-touch-icon: "touch-icon.png"
apple-touch-icon-size: 120
favicon: "favicon.ico"
---

A nice effect of setting description and cover-image is that when you share the link of your book on some social network websites such as Twitter, the link can be automatically expanded to a card with the cover image and description of the book.

Bootstrap style

If you have used R Markdown before, you should be familiar with the Bootstrap\index{Bootstrap style} style (http://getbootstrap.com), which is the default style of the HTML output of R Markdown. The output format function in rmarkdown is html_document(), and we have a corresponding format html_book() in bookdown using html_document() as the base format. In fact, there is a more general format html_chapters() in bookdown and html_book() is just its special case:


Note that it has a base_format argument that takes a base output format function, and html_book() is basically html_chapters(base_format = rmarkdown::html_document). All arguments of html_book() are passed to html_chapters():


That means that you can use most arguments of rmarkdown::html_document, such as toc (whether to show the table of contents), number_sections (whether to number section headings), and so on. Again, check the help page of rmarkdown::html_document to see the full list of possible options. Note that the argument self_contained is hard-coded to FALSE internally, so you cannot change the value of this argument. We have explained the argument split_by in the previous section.

The arguments template and page_builder are for advanced users, and you do not need to understand them unless you have strong need to customize the HTML output, and those many options provided by rmarkdown::html_document() still do not give you what you want.

If you want to pass a different HTML template to the template argument, the template must contain three pairs of HTML comments, and each comment must be on a separate line:

You may open the default HTML template to see where these comments were inserted:

bookdown:::bookdown_file('templates/default.html')
# you may use file.edit() to open this file

Once you know how bookdown works internally to generate multiple-page HTML output, it will be easier to understand the argument page_builder, which is a function to compose each individual HTML page using the HTML fragments extracted from the above comment tokens. The default value of page_builder is a function build_chapter in bookdown, and its source code is relatively simple (ignore those internal functions like button_link()):

extract_fun = function(name, script) {
  x = readLines(script)
  def = paste(name, '= ')
  i = which(substr(x, 1, nchar(def)) == def)
  if (length(i) == 0) stop('Cannot find ', def, ' from ', script)
  i = i[1]
  j = which(x == '}')
  j = min(j[j > i])
  x[i:j]
}

Basically, this function takes a number of components like the HTML head, the table of contents, the chapter body, and so on, and it is expected to return a character string which is the HTML source of a complete HTML page. You may manipulate all components in this function using text-processing functions like gsub() and paste().

What the default page builder does is to put TOC in the first row, the body in the second row, navigation buttons at the bottom of the body, and concatenate them with the HTML head and foot. Here is a sketch of the HTML source code that may help you understand the output of build_chapter():

<html>
  <head>
    <title>A Nice Book</title>
  </head>
  <body>

    <div class="row">TOC</div>

    <div class="row">
      CHAPTER BODY
      <p>
        <button>PREVIOUS</button>
        <button>NEXT</button>
      </p>
    </div>

  </body>
</html>

For all HTML pages, the main difference is the chapter body, and most of the rest of the elements are the same. The default output from html_book() will include the Bootstrap CSS and JavaScript files in the <head> tag.

The TOC is often used for navigation purposes. In the GitBook style, the TOC is displayed in the sidebar. For the Bootstrap style, we did not apply a special style to it, so it is shown as a plain unordered list (in the HTML tag <ul>). It is easy to turn this list into a navigation bar with some CSS techniques. We have provided a CSS file toc.css in this package that you can use, and you can find it here: https://github.com/rstudio/bookdown/blob/master/inst/examples/css/toc.css

You may copy this file to the root directory of your book, and apply it to the HTML output via the css option, e.g.,

---
output:
  bookdown::html_book:
    toc: yes
    css: toc.css
---

There are many possible ways to turn <ul> lists into navigation menus if you do a little bit searching on the web, and you can choose a menu style that you like. The toc.css we just mentioned is a style with white menu texts on a black background, and supports sub-menus (e.g., section titles are displayed as drop-down menus under chapter titles).

As a matter of fact, you can get rid of the Bootstrap style in html_document() if you set the theme option to null, and you are free to apply arbitrary styles to the HTML output using the css option (and possibly the includes option if you want to include arbitrary content in the HTML head/foot).

Tufte style

Like the Bootstrap style, the Tufte\index{Tufte style} style is provided by an output format tufte_html_book(), which is also a special case of html_chapters() using tufte::tufte_html() as the base format. Please see the tufte package [@R-tufte] if you are not familiar with the Tufte style. Basically, it is a layout with a main column on the left and a margin column on the right. The main body is in the main column, and the margin column is used to place footnotes, margin notes, references, and margin figures, and so on.

All arguments of tufte_html_book() have exactly the same meanings as html_book(), e.g., you can also customize the CSS via the css option. There are a few elements that are specific to the Tufte style, though, such as margin notes, margin figures, and full-width figures. These elements require special syntax to generate; please see the documentation of the tufte package. Note that you do not need to do anything special to footnotes and references (just use the normal Markdown syntax ^[footnote] and [@citation]), since they will be automatically put in the margin. A brief YAML example of the tufte_html_book format:

---
output:
  bookdown::tufte_html_book:
    toc: yes
    css: toc.css
---

LaTeX/PDF

We strongly recommend that you use an HTML output format instead of LaTeX\index{LaTeX} when you develop a book, since you will not be too distracted by the typesetting details, which can bother you a lot if you constantly look at the PDF output of a book. Leave the job of careful typesetting to the very end (ideally after you have really finished the content of the book).

The LaTeX/PDF output format is provided by pdf_book() in bookdown. There is not a significant difference between pdf_book() and the pdf_document() format in rmarkdown. The main purpose of pdf_book() is to resolve the labels and cross-references written using the syntax described in Sections \@ref(figures), \@ref(tables), and \@ref(cross-references). If the only output format that you want for a book is LaTeX/PDF, you may use the syntax specific to LaTeX, such as \label{} to label figures/tables/sections, and \ref{} to cross-reference them via their labels, because Pandoc supports LaTeX commands in Markdown. However, the LaTeX syntax is not portable to other output formats, such as HTML and e-books. That is why we introduced the syntax (\#label) for labels and \@ref(label) for cross-references.

There are some top-level YAML options that will be applied to the LaTeX output. For a book, you may change the default document class to book (the default is article), and specify a bibliography style required by your publisher. A brief YAML example:

---
documentclass: book
bibliography: [book.bib, packages.bib]
biblio-style: apalike
---

There are a large number of other YAML options that you can specify for LaTeX output, such as the paper size, font size, page margin, line spacing, font families, and so on. See http://pandoc.org/MANUAL.html#variables-for-latex for a full list of options.

The pdf_book() format is a general format like html_book(), and it also has a base_format argument:


You can change the base_format function to other output format functions, and bookdown has provided a simple wrapper function tufte_book2(), which is basically pdf_book(base_format = tufte::tufte_book), to produce a PDF book using the Tufte PDF style (again, see the tufte package).

E-Books

Currently bookdown provides two e-book\index{e-book} formats, EPUB\index{EPUB} and MOBI\index{MOBI}. Books in these formats can be read on devices like smartphones, tablets, or special e-readers such as Kindle.

EPUB

To create an EPUB book, you can use the epub_book() format. It has some options in common with rmarkdown::html_document():


The option toc is turned off because the e-book reader can often figure out a TOC automatically from the book, so it is not necessary to add a few pages for the TOC. There are a few options specific to EPUB:

An EPUB book is essentially a collection of HTML pages, e.g., you can apply CSS rules to its elements, embed images, insert math expressions (because MathML is partially supported), and so on. Figure/table captions, cross-references, custom blocks, and citations mentioned in Chapter \@ref(components) also work for EPUB. You may compare the EPUB output of this book to the HTML output, and you will see that the only major difference is the visual appearance.

There are several EPUB readers available, including Calibre (https://www.calibre-ebook.com), Apple's iBooks, and Google Play Books.

MOBI

MOBI e-books can be read on Amazon's Kindle devices. Pandoc does not support MOBI output natively, but Amazon has provided a tool named KindleGen (https://www.amazon.com/gp/feature.html?docId=1000765211) to create MOBI books from other formats, including EPUB and HTML. We have provided a simple wrapper function kindlegen() in bookdown to call KindleGen to convert an EPUB book to MOBI. This requires you to download KindleGen first, and make sure the KindleGen executable can be found via the system environment variable PATH.

Another tool to convert EPUB to MOBI is provided by Calibre\index{Calibre}. Unlike KindleGen, Calibre is open-source and free, and supports conversion among many more formats. For example, you can convert HTML to EPUB, Word documents to MOBI, and so on. The function calibre() in bookdown is a wrapper function of the command-line utility ebook-convert in Calibre. Similarly, you need to make sure that the executable ebook-convert can be found via the environment variable PATH. If you use OS X, you can install both KindleGen and Calibre via Homebrew-Cask (https://caskroom.github.io), so you do not need to worry about the PATH issue.

A single document

Sometimes you may not want to write a book, but a single long-form article or report instead. Usually what you do is call rmarkdown::render()\index{rmarkdown::render()} with a certain output format. The main features missing there are the automatic numbering of figures/tables/equations, and cross-referencing figures/tables/equations/sections. We have factored out these features from bookdown, so that you can use them without having to prepare a book of multiple Rmd files.

The functions html_document2(), tufte_html2(), pdf_document2(), word_document2(), tufte_handout2(), and tufte_book2() are designed for this purpose. If you render an R Markdown document with the output format, say, bookdown::html_document2, you will get figure/table numbers and be able to cross-reference them in the single HTML page using the syntax described in Chapter \@ref(components).

The above HTML and PDF output format functions are basically wrappers of output formats bookdown::html_book and bookdown::pdf_book, in the sense that they changed the base_format argument. For example, you can take a look at the source code of pdf_document2:

bookdown::pdf_document2

After you know this fact, you can apply the same idea to other output formats by using the appropriate base_format. For example, you can port the bookdown features to the jss_article format in the rticles package [@R-rticles] by using the YAML metadata:

output:
  bookdown::pdf_book:
    base_format: rticles::jss_article

Then you will be able to use all features we introduced in Chapter \@ref(components).

Although the gitbook() format was designed primarily for books, you can actually also apply it to a single R Markdown document. The only difference is that there will be no search button on the single page output, because you can simply use the searching tool of your web browser to find text (e.g., press Ctrl + F or Command + F). You may also want to set the option split_by to none to only generate a single output page, in which case there will not be any navigation buttons, since there are no other pages to navigate to. You can still generate multiple-page HTML files if you like. Another option you may want to use is self_contained = TRUE when it is only a single output page.



sawyerda/bookdown documentation built on May 20, 2019, 3:32 p.m.