In r-lib/lintr: A 'Linter' for R Code

This vignette describes the steps necessary to create a new linter.

See the last section for some details specific to writing new linters for {lintr}.

A good example of a simple linter is the pipe_call_linter.

#' Pipe call linter
#'
#' Force explicit calls in magrittr pipes, e.g.,
#' `1:3 %>% sum()` instead of `1:3 %>% sum`.
#'
#' @evalRd rd_tags("pipe_call_linter")
#' @seealso [linters] for a complete list of linters available in lintr.
#' @export
pipe_call_linter <- function() {
  xpath <- "//expr[preceding-sibling::SPECIAL[text() = '%>%'] and *[1][self::SYMBOL]]"

  Linter(linter_level = "expression", function(source_expression) {
    if (!is_lint_level(source_expression, "expression")) {
      return(list())
    }

    xml <- source_expression$xml_parsed_content
    if (is.null(xml)) return(list())

    bad_expr <- xml2::xml_find_all(xml, xpath)

    xml_nodes_to_lints(
      bad_expr,
      source_expression = source_expression,
      lint_message = "Use explicit calls in magrittr pipes, i.e., `a %>% foo` should be `a %>% foo()`.",
      type = "warning"
    )
  })
}

Let's walk through the parts of the linter individually.

Writing the linter

#' Pipe call linter
#'
#' Force explicit calls in magrittr pipes, e.g.,
#' `1:3 %>% sum()` instead of `1:3 %>% sum`.

Describe the linter, giving it a title and briefly covering the usages that are discouraged when the linter is active.

#' @evalRd rd_tags("pipe_call_linter")
#' @seealso [linters] for a complete list of linters available in lintr.
#' @export

These lines (1) generate a Tags section in the documentation for the linter^[NB: this is a helper function for generating custom Rd styling. See R/linter_tags.R.]; (2) link to the full table of available linters; and (3) mark the function for export. The most unfamiliar here is probably (1), which can be skipped outside of lintr itself.

pipe_call_linter <- function() {

Next, we define the name of the new linter. The convention is to suffix all linter names with _linter. All _linter functions are function factories that return a closure that will do the actual linting function. We could define additional parameters that are useful for the linter in this function declaration (see, e.g. assignment_linter), but pipe_call_linter requires no additional arguments.

xpath <- "//expr[preceding-sibling::SPECIAL[text() = '%>%'] and *[1][self::SYMBOL]]"

Here is the core linter logic. xpath is an XPath expression for expressions matching the discouraged usage. xpath is saved inside the factory code (as opposed to inside the linter itself) for efficiency. Often, the xpath will be somewhat complicated / involve some assembly code using paste() or glue::glue()[^See infix_spaces_linter() for an example of this], in which case it is preferable to execute this code only once when creating the linter; the cached XPath is then re-used on each expression in each file where the linter is run.

Let's examine the XPath a bit more closely:

//expr                  # global search (//) for 'expr' nodes (R expressions), at any nesting level
[                       # node[...] looks for any 'node' satisfying conditions in ...
  preceding-sibling::   # "siblings" are at the same nesting level in XML
    SPECIAL[            # 'SPECIAL' is the parse token for infix operators like %% or %+%
      text() = '%>%'    # text() returns the string associated with this node
    ]                   #
  and                   # combine conditions with 'and'
  *                     # match any node
  [1]                   # match the first such node
  [self::SYMBOL]        # match if the current node is a 'SYMBOL' (i.e., a 'name' in R)
]                       #

Taken together, that means we want to match expr nodes preceded by the %>% infix operator whose first child node is a name. That maps pretty closely to the description of what the pipe_call_linter is looking for, but there is subtlety in mapping between the R code you're used to and how they show up in the XML representation. expr nodes in particular take some practice to get accustomed to -- use the plentiful XPath-based linters in lintr as a guide to get extra practice^[The W3schools tutorials are also quite helpful; see https://www.w3schools.com/xml/xpath_intro.asp]. Note: xml2 implements XPath 1.0, which lacks some helpful features available in XPath 2.0.

Linter(function(source_expression) {

This is the closure. It will be called on the source_expression variable that contains the top level expressions in the file to be linted. The call to Linter() gives this closure the class 'linter' (it also guesses the name of the linter; see ?Linter for more details).

The raw text of the expression is available from source_file$content. However, it is not generally possible to implement linters from the raw text -- consider equals_na_linter. If we just look for == NA in the text of the file, we'll generate many false positives, e.g. in comments (such as # note: is.na() is the proper way to check == NA) or inside character literals (such as warning("don't use == NA to check missingness")). We're also likely to generate false negatives, for example when == and NA appear on different lines! Working around these issues using only the un-parsed text in every situation amounts to re-implementing the parser.

Therefore it is recommended to work with the tokens from source_file$parsed_content or source_file$xml_parsed_content, as they are tokenized from the R parser. These tokens are obtained from parse() and utils::getParseData() calls done prior to calling the new linter. getParseData() returns a data.frame with information from the source parse tree of the file being linted. A list of tokens is available from r-source/src/main/gram.y.

source_file$xml_parsed_content uses xmlparsedata::xml_parse_data() to convert the getParseData() output into an XML tree, which enables writing linter logic in XPath, a powerful language for expressing paths within the nested XML data structure. Most linters in lintr are built using XPath because it is a powerful language for computation on the abstract syntax tree / parse tree.

if (!is_lint_level(source_expression, "expression")) {
  return(list())
}

Here, we return early if source_expression is not the expression-level object. get_source_expression() returns an object that parses the input file in two ways -- once is done expression-by-expression, the other contains all of the expressions in the file. This is done to facilitate caching. Suppose your package has a long source file (e.g., 100s of expressions) -- rather than run linters on every expression every time the file is updated, when caching is activated lintr will only run the linter again on expressions that have changed.

Note that this check is unnecessary because we provided linter_level = "expression" which guarantees that source_expression will be at the expression level and not at the file level.

Therefore, it is preferable to write expression-level linters whenever possible. Two types of exceptions observed in lintr are (1) when several or all expressions are required to ensure the linter logic applies (e.g., conjunct_test_linter looks for consecutive calls to stopifnot(), which will typically appear on consecutive expressions) or (2) when the linter logic is very simple & fast to compute, so that the overhead of re-running the linter is low (e.g., single_quotes_linter). In those cases, use is_lint_level(source_expression, "file").

xml <- source_expression$xml_parsed_content

bad_expr <- xml2::xml_find_all(xml, xpath)

source_expression$xml_parsed_content is copied to a local variable (this is not strictly necessary but facilitates debugging). Then xml2::xml_find_all() is used to execute the XPath on this particular expression. Keep in mind that it is typically possible for a single expression to generate more than one lint -- for example, x %>% na.omit %>% sum will trigger the pipe_call_linter() twice^[This is particularly important if you want the message field in the resulting Lint() to vary depending on the exact violation that's found. For pipe_call_linter(), the message is always the same. See assignment_linter() for an example where the message can vary.].

xml_nodes_to_lints(
  bad_expr,
  source_expression = source_expression,
  lint_message = "Use explicit calls in magrittr pipes, i.e., `a %>% foo` should be `a %>% foo()`.",
  type = "warning"
)

Finally, we pass the matching XML node(s) to xml_nodes_to_lints(), which returns Lint objects corresponding to any "bad" usages found in source_expression. See ?Lint and ?xml_nodes_to_lints for details about the arguments. Note that here, the message for the lint is always the same, but for many linters, the message is customized to more closely match the observed usage. In such cases, xml_nodes_to_lint() can conveniently accept a function in lint_message which takes a node as input and converts it to a customized message. See, for example, seq_linter.

linter_level = "expression"

This is a more efficient way to implement the condition if (!is_lint_level(source_expression, "expression")).

Writing linter tests

(NB: this section uses the assignment_linter() which has simpler examples than pipe_continuation_linter().)

{lintr} works best inside the {testthat} unit testing framework, in particular, {lintr} exports lintr::expect_lint() which is designed as a companion to other testthat expectations.

You can define tests inside separate test_that calls. Most of the linters use the same default form.

test_that("returns the correct linting", {

You then test a series of expectations for the linter using expect_lint. Please see ?expect_lint for a full description of the parameters.

The main three aspects to test are:

Linter returns no lints when there is nothing to lint, e.g.

expect_no_lint("blah", assignment_linter())

Linter returns a lint when there is something to lint, e.g.

expect_lint("blah=1",
  rex("Use <-, not =, for assignment."),
  assignment_linter()
)

As many edge cases as you can think of that might break it, e.g.

expect_lint("fun((blah = fun(1)))",
  rex("Use <-, not =, for assignment."),
  assignment_linter()
)

You may want to test that additional lint attributes are correct, such as the type, line number, column number, e.g.

expect_lint("blah=1",
  list(message = "assignment", line_number = 1, column_number = 5, type = "style"),
  assignment_linter()
)

Finally, it is a good idea to test that your linter reports multiple lints if needed, e.g.

expect_lint("blah=1; blah=2",
  list(
    list(line_number = 1, column_number = 5),
    list(line_number = 1, column_number = 13),
  ),
  assignment_linter()
)

It is always better to write too many tests rather than too few.

Other utilities for writing custom linters

Besides is_lint_level(), {lintr} also exports some other helpers generally useful for writing custom linters; these are used a lot in the internals of our own helpers, and so they've been tested and demonstrated their utility already.

get_r_string(): Whenever your linter needs to examine the value of a character literal (e.g., whether an argument value is set to some string), use this to extract the string exactly as R will see it. This is especially important to make your logic robust to R-4-style raw strings like R"-(hello)-", which is otherwise difficult to express, for example as an XPath.
xml_find_function_calls(): Whenever your linter needs to query R function calls, e.g. via the XPath //SYMBOL_FUNCTION_CALL[text() = 'myfun'], use this member of source_expression to obtain the function call nodes more efficiently. Instead of r xml <- source_expression$xml_parsed_content xpath <- "//SYMBOL_FUNCTION_CALL[text() = 'myfun']/parent::expr/some/cond" xml_find_all(xml, xpath) use r xml_calls <- source_expression$xml_find_function_calls("myfun") call_xpath <- "parent::expr/some/cond" xml_find_all(xml_calls, call_xpath)
make_linter_from_xpath() and make_linter_from_function_xpath(): Whenever your linter can be expressed by a static XPath and a static message, use make_linter_from_xpath() or, if the XPath starts with //SYMBOL_FUNCTION_CALL, use make_linter_from_function_xpath(). Instead of make_linter_from_xpath(xpath = "//SYMBOL_FUNCTION_CALL[text() = 'foo' or text() = 'bar']/cond"), use make_linter_from_function_xpath(function_names = c("foo", "bar"), xpath = "cond").

Contributing to `{lintr}`

More details about writing tests for new `{lintr}` linters

The {lintr} package uses testthat for testing. You can run all of the currently available tests using devtools::test(). If you want to run only the tests in a given file use the filter argument to devtools::test().

Linter tests should be put in the tests/testthat/ folder. The test filename should be the linter name prefixed by test-, e.g. test-pipe_continuation_linter.R.

Adding your linter to the default_linters

If your linter implements part of the tidyverse style guide you can add it to default_linters. This object is created in the file zzz.R (this name ensures that it will always run after all the linters are defined). Add your linter name to the default_linters list before the NULL at the end, and add a corresponding test case to the test script ./tests/testthat/default_linter_testcode.R.