README.md

NOTICE: This package was built for the PatentsView Legacy API, which has been discontinued as of May 1 2025. As a result... none of it works anymore. I will hopefully get around to updating it sometime in 2025.

patentsview2

patentsview2 is a simple set of functions to query the Patents endpoint of the PatentsView API using CPC Subclass identifiers (e.g. G16H for medical informatics). It is complementary to, but not integrated with, the patentsview package available on CRAN.

introduction

Patents and patent applications are a valuable measure of innovation within countries, sub-national geographies (e.g. states or cities), industries, and firms. Numerous patent databases exist, including PATSTAT from the EPO, NBER, Google Patents, OECD Patent Microdata, the USPTO, and others. These databases can be unweildy, so the USPTO created PatentsView as a user-friendly query portal for data about individual patents. They also implemented several API endpoints that can be queried directly. Despite its public availability and documentation, the PatentsView API isn’t that easy to use. So to collect data about patents across different patent types, classified by Cooperative Patent Classification (CPC) subclasses, I wrote a few helper functions. Hope they’re useful.

the patents_view() function

patentsview2::patents_view() is the primary function. This function calls two other functions, pv_post() and clean_patents(). This function has a single argument, cpc, which takes a string consisting of any of the 4-character CPC subclasses available in PatentsView. By default, the function returns all patent applications with the defined classification since Jan 1, 2000 to the USPTO by US-based assignees.

patentsview2::patents_view(cpc="F03B")

This returns a data frame of patent application observations and 27 fields of information about the patent as well as the first named inventor and assignee.

pv_post() and clean_patents()

You should not have to call either of these functions directly. Both are called by patents_view() to help with constructing the POST call to the API and to clean up the data frame, as their names imply.

the CPC datasets

The package also includes two data sets for reference to CPC subclasses. cpc_subgroups lists all subclasses, groups, and subgroups (258,827 observations), while cpc_subclasses lists only the four-character subclass codes (615 observations) that can be used in the cpc argument within patents_view().

data("cpc_subclasses") # 615 obs by 2 vars
data("cpc_subgroups") # 258,827 obs by 5 vars

Use cpc_subclasses to find subclasses of interest, and use the 4-character code found in the cpc_subclass field in your query. You can also browse the CPC hierarchy from the USPTO. You may also want to loop through several CPC subclasses, as below. Note that the below code does not look for patents with all subclasses, but rather performs distinct API calls for each of the 5 random CPC subclasses sampled. In this case, the 5 data frames will all be in the random_cpcs list.

# random sample of CPCs
cpc_samp <- sample(cpc_subclasses$cpc_subclass,5)
random_cpcs <- list()
for (i in c(1:length(cpc_samp))) {
  random_cpcs[[i]] <- patents_view(cpc=cpc_samp[i])
}
names(random_cpcs) <- cpc_samp

an example

# CPC Subclass B62K: Unicycles
cpc_subclasses %>% filter(cpc_subclass=="B62K")
#>   cpc_subclass     title
#> 1         B62K Unicycles
b62k <- patentsview2::patents_view(cpc="B62K")
dim(b62k) 
#> [1] 1844   27
# 1,844 patent applications since Jan 1 2000

# get number of unique patient numbers
# remember, each observation is an APPLICATION not a patent
unique(b62k$patent_number) %>% length() 
#> [1] 1811
# 1,811 unqiue patent numbers

# how many unique assignees?
unique(b62k$assignee_organization) %>% length() 
#> [1] 454
# 454 unqiue assignees

# what other fields do we have?
colnames(b62k)
#>  [1] "patent_id"                           
#>  [2] "patent_number"                       
#>  [3] "patent_title"                        
#>  [4] "patent_abstract"                     
#>  [5] "patent_date"                         
#>  [6] "patent_year"                         
#>  [7] "patent_firstnamed_inventor_city"     
#>  [8] "patent_firstnamed_inventor_state"    
#>  [9] "patent_firstnamed_inventor_latitude" 
#> [10] "patent_firstnamed_inventor_longitude"
#> [11] "patent_num_cited_by_us_patents"      
#> [12] "patent_num_combined_citations"       
#> [13] "patent_processing_time"              
#> [14] "patent_type"                         
#> [15] "patent_firstnamed_assignee_id"       
#> [16] "patent_firstnamed_assignee_city"     
#> [17] "patent_firstnamed_assignee_state"    
#> [18] "patent_firstnamed_assignee_latitude" 
#> [19] "patent_firstnamed_assignee_longitude"
#> [20] "assignee_organization"               
#> [21] "assignee_type"                       
#> [22] "assignee_total_num_patents"          
#> [23] "assignee_key_id"                     
#> [24] "app_date"                            
#> [25] "app_id"                              
#> [26] "cpcs"                                
#> [27] "inv_city_state"

# where are unicycle patent applications concentrated?
b62k %>% 
  group_by(inv_city_state) %>% 
  tally() %>% 
  arrange(desc(n)) %>% 
  ungroup() %>%
  mutate(pct=n/sum(n)*100,
         cumulative_pct=cumsum(pct)) %>%
  top_n(10,n) %>% kable()

| inv_city_state | n | pct | cumulative_pct | | :--------------- | -: | -------: | --------------: | | Bedford, NH | 70 | 3.796095 | 3.796095 | | Los Gatos, CA | 54 | 2.928416 | 6.724512 | | Santa Cruz, CA | 46 | 2.494577 | 9.219089 | | Morgan Hill, CA | 45 | 2.440347 | 11.659436 | | Chicago, IL | 43 | 2.331887 | 13.991323 | | Roseau, MN | 33 | 1.789588 | 15.780911 | | Aptos, CA | 30 | 1.626898 | 17.407809 | | Capitola, CA | 30 | 1.626898 | 19.034707 | | Portland, OR | 27 | 1.464208 | 20.498915 | | Madison, WI | 26 | 1.409978 | 21.908894 |



nateapathy/patentsview documentation built on July 17, 2025, 4:23 a.m.