knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Three ways to write the same query

There are three different ways that you can create a query to pass to search_pv(). For example, let's say you want to find all patents published in the last 10 years that have the word "dog" in their titles or abstracts, and whose assignees are located in either the US or Canada. Here are the three options for how you could write such a query:

  1. Use a raw character/JSON vector:
query_v_1 <-
  '{"_and":[
          {"_gte":{"patent_date":"2007-03-01"}},
          {"_or":[
            {"_text_all":{"patent_title":"dog"}},
            {"_text_all":{"patent_abstract":"dog"}}
          ]},
          {"_or":[
            {"_eq":{"assingee_country":"US"}},
            {"_eq":{"assingee_country":"CA"}}
          ]}
  ]}'
  1. Use a list (which will be converted to JSON by search_pv()):
query_v_2 <- 
  list("_and" = 
       list(
          list("_gte" = list(patent_date = "2007-03-01")),
          list("_or" = 
                 list(
                   list("_text_all" = list(patent_title = "dog")),
                   list("_text_all" = list(patent_abstract = "dog"))
                   )
               ),
          list("_or" = 
                 list(
                   list("_eq" = list(assingee_country = "US")),
                   list("_eq" = list(assingee_country = "CA"))
                   )
               )
      )
  )
  1. Use the domain specific language (DSL) provided in patentsview to create an object of class pv_query (which is also a list):
library(patentsview)

query_v_3 <- 
  with_qfuns(
    and(
      gte(patent_date = "2007-03-01"),
      or(
        text_all(patent_title = "dog"),
        text_all(patent_abstract = "dog")
      ),
      eq(assingee_country = c("US", "CA"))
    )
  )

Why use the DSL?

We can see that all three versions of the query shown above are equivalent:

jsonlite::minify(query_v_1)
jsonlite::toJSON(query_v_2, auto_unbox = TRUE)
jsonlite::toJSON(query_v_3, auto_unbox = TRUE)

...So why would you ever want to use method 3 over methods 1 and 2? There are two main reasons:

1. Query validation

search_pv() will check your query for errors if you use methods 2 or 3. This is not the case for method 1, were you would have to rely on the API's error messages for guidance if your query is invalid...search_pv() checks queries for the following:

2. Concise, easy to use syntax for complex queries

Methods 1 and 3 shown above are both shorter than method 2, making them quicker. It's also a lot easier to get the JSON syntax correct when using method 3 compared to method 1, because you don't have to write any JSON at all using the DSL...This is important because the API is fairly picky about the query syntax, so it's not trivial to get it correct. For example, the API will throw an error if you use a box in your JSON when is not absolutely necessary, even if your query is still valid JSON (e.g., query = {"_gte":{"patent_date":["2007-03-01"]}} will throw an error).

Compared to method 1, method 3 will correctly "or" together values if you put them in a vector. For example, in the query shown above, a vector of two values was given for assingee_country (c("US", "CA")). This safely converted the single "equals" statement in the third element of the query (eq(assingee_country = c("US", "CA"))) to two separate equals statements that are or'd together.[^1]

Basics of the language

All of the functions that make up the DSL are found in the qry_funs list (e.g., qry_funs$eq()). You can evaluate code in the context of this list using the function with_qfuns(). See ?with_qfuns() for an example that demonstrates how using this function saves you typing. There are three types of functions in qry_funs:

  1. Comparison operator functions (eq, neq, gt, gte, lt, lte, begins, contains, text_all, text_any, text_phrase). These functions are used to compare a field to a value. For example, using the "less than or equal to" function (lte), we can filter out patents published after some date (e.g., query = qry_funs$lte(patent_date = "2001-01-05")). See the "comparison operators" section of the API's query language page for a description of the 11 comparison operators, noting that the patentsview function equivalent of the operator just drops the leading "_". One important thing to keep in mind is that certain comparison operators only work with certain data types. For example, you can't use the begins function on patent_abstract because patent_abstract is of data type "full text" and begins only works with fields of data type "string."
  2. Array functions (and and or). You can use these functions to logically combine the calls to the various comparison operators. For example, we can require that the patent date is less than or equal to 2001-01-05 and the inventor's last name is "Ihaka" (query = with_qfuns(and(lte(patent_date = "2001-01-05"), eq(inventor_last_name = "Ihaka")))).
  3. not function (not). This function negates a comparison. In other words, the not function basically says that the comparison is not true, instead of is true. For example, we could search for patents that don't have the word "hi" in their titles like this: qry_funs$not(qry_funs$text_phrase(patent_title = "hi")).

Query examples

The following queries are intended for the patents endpoint

Patents linked to an assignee with 10 or fewer distinct (and disambiguated) inventors:

qry_funs$lte(assignee_total_num_inventors = 10)

Patents assigned to the "CPC subsection"[^2] of G12 (physics instruments):

qry_funs$eq(cpc_subsection_id = "G12")

Patents that have an inventor listed on them whose first name contains "joh" and has an abstract with either the phrase "dog bark" or "cat meow," but not the phrase "dog chain":

with_qfuns(
  and(
    contains(rawinventor_first_name = "joh"),
    text_phrase(patent_abstract = c("dog bark", "cat meow")),
    not(
      text_phrase(patent_abstract = c("dog chain"))
    )
  )
)

Patents with an inventor whose disambiguated last name is “Smith” and with “cotton gin” in the patent title, or with an inventor whose disambiguated last name is “Hopper” and with “COBOL” in the patent title:

with_qfuns(
  or(
    and(
      eq(inventor_last_name = "smith"),
      text_phrase(patent_title = "cotton gin")
    ),
    and(
      eq(inventor_last_name = "hopper"),
      text_phrase(patent_title = "COBOL")
    )
  )
)

[^1]: One may note that using "value arrays" is supposedly supported natively by the API. For example, the API documentation gives the following query as an example of their use: '{"inventor_last_name":["Whitney","Hopper"]}'. The problem with this is that the API is not consistent in its handling of value arrays. For many of the comparison operators, one cannot "or" together values using arrays. Thus, the DSL in patentsview never relies on arrays when creating queries. [^2]: PatentsView gets the names of the CPC hierarchy wrong. For example, a "CPC subsection" according to PatentsView is actually a CPC class.



crew102/patentsview documentation built on May 14, 2019, 11:33 a.m.