query-matches-and-captures: Query matches and captures

query-matches-and-capturesR Documentation

Query matches and captures

Description

These two functions execute a query on a given node, and return the captures of the query for further use. Both functions return the same information, just structured differently depending on your use case.

  • query_matches() returns the captures first grouped by pattern, and further grouped by match within each pattern. This is useful if you include multiple patterns in your query.

  • query_captures() returns a flat list of captures ordered by their node location in the original text. This is normally the easiest structure to use if you have a single pattern without any alternations that would benefit from having individual captures split by match.

Both also return the capture name, i.e. the ⁠@name⁠ you specified in your query.

Usage

query_matches(x, node, ..., range = NULL)

query_captures(x, node, ..., range = NULL)

Arguments

x

⁠[tree_sitter_query]⁠

A query.

node

⁠[tree_sitter_node]⁠

A node to run the query over.

...

These dots are for future extensions and must be empty.

range

⁠[tree_sitter_range / NULL]⁠

An optional range to restrict the query to.

Predicates

There are 3 core types of predicates supported:

  • ⁠#eq? @capture "string"⁠

  • ⁠#eq? @capture1 @capture2⁠

  • ⁠#match? @capture "regex"⁠

Here are a few examples:

# Match an identifier named `"name-of-interest"`
(
  (identifier) @id
  (#eq? @id "name-of-interest")
)

# Match a binary operator where the left and right sides are the same name
(
  (binary_operator
    lhs: (identifier) @id1
    rhs: (identifier) @id2
  )
  (#eq? @id1 @id2)
)

# Match a name with a `_` in it
(
  (identifier) @id
  (#match? @id "_")
)

Each of these predicates can be inverted with a ⁠not-⁠ prefix.

(
  (identifier) @id
  (#not-eq? @id "name-of-interest")
)

Each of these predicates can be converted from an all style predicate to an any style predicate with an ⁠any-⁠ prefix. This is only useful with quantified captures, i.e. ⁠(comment)+⁠, where the + specifies "one or more comment".

# Finds a block of comments where ALL comments are empty comments
(
  (comment)+ @comment
  (#eq? @comment "#")
)

# Finds a block of comments where ANY comments are empty comments
(
  (comment)+ @comment
  (#any-eq? @comment "#")
)

This is the full list of possible predicate permutations:

  • ⁠#eq?⁠

  • ⁠#not-eq?⁠

  • ⁠#any-eq?⁠

  • ⁠#any-not-eq?⁠

  • ⁠#match?⁠

  • ⁠#not-match?⁠

  • ⁠#any-match?⁠

  • ⁠#any-not-match?⁠

String double quotes

The underlying tree-sitter predicate parser requires that strings supplied in a query must use double quotes, i.e. "string" not 'string'. If you try and use single quotes, you will get a query error.

⁠#match?⁠ regex

The regex support provided by ⁠#match?⁠ is powered by grepl().

Escapes are a little tricky to get right within these match regex strings. To use something like ⁠\s⁠ in the regex string, you need the literal text ⁠\\s⁠ to appear in the string to tell the tree-sitter regex engine to escape the backslash so you end up with just ⁠\s⁠ in the captured string. This requires putting two literal backslash characters in the R string itself, which can be accomplished with either "\\\\s" or using a raw string like r'["\\\\s"]' which is typically a little easier. You can also write your queries in a separate file (typically called queries.scm) and read them into R, which is also a little more straightforward because you can just write something like ⁠(#match? @id "^\\s$")⁠ and that will be read in correctly.

Examples


# ---------------------------------------------------------------------------
# Simple query

text <- "
foo + b + a + ab
and(a)
"

source <- "
(identifier) @id
"

language <- treesitter.r::language()

query <- query(language, source)
parser <- parser(language)
tree <- parser_parse(parser, text)
node <- tree_root_node(tree)

# A flat ordered list of captures, that's most useful here since
# we only have 1 pattern!
captures <- query_captures(query, node)
captures$node

# ---------------------------------------------------------------------------
# Quantified query

text <- "
# this
# that
NULL

# and
# here
1 + 1

# there
2
"

# Find blocks of one or more comments
# The `+` is a regex `+` meaning "one or more" comments in a row
source <- "
(comment)+ @comment
"

language <- treesitter.r::language()

query <- query(language, source)
parser <- parser(language)
tree <- parser_parse(parser, text)
node <- tree_root_node(tree)

# The extra structure provided by `query_matches()` is useful here so
# we can see the 3 distinct blocks of comments
matches <- query_matches(query, node)

# We provided one query pattern, so lets extract that
matches <- matches[[1]]

# 3 blocks of comments
matches[[1]]
matches[[2]]
matches[[3]]

# ---------------------------------------------------------------------------
# Multiple query patterns

# If you know you need to run multiple queries, you can run them all at once
# in one pass over the tree by providing multiple query patterns.

text <- "
a <- 1
b <- function() {}
c <- b
"

# Use an extra set of `()` to separate multiple query patterns
source <- "
(
  (identifier) @id
)
(
  (binary_operator) @binary
)
"

language <- treesitter.r::language()

query <- query(language, source)
parser <- parser(language)
tree <- parser_parse(parser, text)
node <- tree_root_node(tree)

# The extra structure provided by `query_matches()` is useful here so
# we can separate the two queries
matches <- query_matches(query, node)

# First query - all identifiers
matches[[1]]

# Second query - all binary operators
matches[[2]]

# ---------------------------------------------------------------------------
# The `#eq?` and `#match?` predicates

text <- '
fn(a, b)

test_that("this", {
  test
})

fn_name(args)

test_that("that", {
  test
})

fn2_(args)
'

language <- treesitter.r::language()
parser <- parser(language)
tree <- parser_parse(parser, text)
node <- tree_root_node(tree)

# Use an extra set of outer `()` when you are applying a predicate to ensure
# the query pattern is grouped with the query predicate.
# This one finds all function calls where the function name is `test_that`.
source <- '
(
  (call
    function: (identifier) @name
  ) @call
  (#eq? @name "test_that")
)
'

query <- query(language, source)

# It's fine to have a flat list of captures here, but we probably want to
# remove the `@name` captures and just retain the full `@call` captures.
captures <- query_captures(query, node)
captures$node[captures$name == "call"]

# This one finds all functions with a `_` in their name. It uses the R
# level `grepl()` for the regex processing.
source <- '
(
  (call
    function: (identifier) @name
  ) @call
  (#match? @name "_")
)
'

query <- query(language, source)

captures <- query_captures(query, node)
captures$node[captures$name == "call"]

# ---------------------------------------------------------------------------
# The `any-` and `not-` predicate modifiers

text <- '
# 1
#
# 2
NULL

# 3
# 4
NULL

#
#
NULL

#
# 5
#
# 6
#
NULL
'

language <- treesitter.r::language()
parser <- parser(language)
tree <- parser_parse(parser, text)
node <- tree_root_node(tree)

# Two queries:
# - Find comment blocks where there is at least one empty comment
# - Find comment blocks where there is at least one non-empty comment
source <- '
(
  (comment)+ @comment
  (#any-eq? @comment "#")
)
(
  (comment)+ @comment
  (#any-not-eq? @comment "#")
)
'

query <- query(language, source)

matches <- query_matches(query, node)

# Query 1 has 3 comment blocks that match
query1 <- matches[[1]]
query1[[1]]
query1[[2]]
query1[[3]]

# Query 2 has 3 comment blocks that match (a different set than query 1!)
query2 <- matches[[2]]
query2[[1]]
query2[[2]]
query2[[3]]


treesitter documentation built on April 11, 2025, 5:51 p.m.