knitr::opts_chunk$set( collapse = TRUE, comment = "#>", warning = FALSE, message = FALSE )
The following is a quick analysis of the top organizations patenting in the field of databases.
library(patentsview) library(dplyr) library(highcharter) library(DT) library(knitr) # We first need to write a query. Our query will look for "database" in either # the patent title or abstract...Note, this isn't a terribly good way to ID our # patents, but it will work for the purpose of demonstration. Users who are # interested in writing higher-quality queries could consult the large body of # research that has been done in patent document retrieval. query <- with_qfuns( or( text_phrase(patent_abstract = "database"), text_phrase(patent_title = "database") ) ) query # Create a list of the fields we'll need for the analysis: fields <- c( "patent_number", "assignee_organization", "patent_num_cited_by_us_patents", "app_date", "patent_date", "assignee_total_num_patents" ) # Send an HTTP request to the PatentsView API to get data: pv_out <- search_pv(query = query, fields = fields, all_pages = TRUE)
db_pats
). We'll also calculate how many total patents these assignees have (ttl_pats
) and what fraction of their total patents relate to databases (frac_db_pats
).# Unnest the data frames that are stored in the assignee list column: dl <- unnest_pv_data(data = pv_out$data, pk = "patent_number") dl # Create a data frame with the top 75 assignees: top_asgns <- dl$assignees %>% filter(!is.na(assignee_organization)) %>% # some patents are not assigned to an org (only to an inventor) mutate(ttl_pats = as.numeric(assignee_total_num_patents)) %>% group_by(assignee_organization, ttl_pats) %>% # group by ttl_pats so we can retain ttl_pats summarise(db_pats = n()) %>% mutate(frac_db_pats = round(db_pats / ttl_pats, 3)) %>% ungroup() %>% select(c(1, 3, 2, 4)) %>% arrange(desc(db_pats)) %>% slice(1:75) # Create datatable: datatable( data = top_asgns, rownames = FALSE, colnames = c("Assignee", "DB patents","Total patents", "DB patents / total patents"), caption = htmltools::tags$caption( style = 'caption-side: top; text-align: left; font-style: italic;', "Table 1: Top assignees in 'databases'" ), options = list(pageLength = 10) )
IBM is far and away the biggest player in the field. However, we can see that Oracle and Salesforce.com are relatively more interested in this area, as indicated by the fraction of their patents that relate to databases.
# Create a data frame with patent counts by application year for each assignee: data <- top_asgns %>% select(-contains("pats")) %>% slice(1:5) %>% inner_join(dl$assignees) %>% inner_join(dl$applications) %>% mutate(app_yr = as.numeric(substr(app_date, 1, 4))) %>% group_by(assignee_organization, app_yr) %>% count() # Plot the data using highchartr: hc_add_series_df( highchart(), data = data, "line", x = app_yr, y = n, group = assignee_organization ) %>% hc_plotOptions(series = list(marker = list(enabled = FALSE))) %>% hc_xAxis(title = list(text = "Application year")) %>% hc_yAxis(title = list(text = "DB patents")) %>% hc_title(text = "Top five assignees in 'databases'") %>% hc_subtitle(text = "Yearly patent applications over time")
It's hard to see any clear trends in this graph. What is clear is that the top assignees have all been patenting in the field for many years.
# Write a ranking function that will be used to rank patents by their citation counts: percent_rank2 <- function(x) (rank(x, ties.method = "average", na.last = "keep") - 1) / (sum(!is.na(x)) - 1) # Create a data frame with normalized citation rates and stats from Step 2: asng_p_dat <- dl$patents %>% mutate(patent_yr = substr(patent_date, 1, 4)) %>% group_by(patent_yr) %>% mutate(perc_cite = percent_rank2(patent_num_cited_by_us_patents)) %>% inner_join(dl$assignees) %>% group_by(assignee_organization) %>% summarise(mean_perc = mean(perc_cite)) %>% inner_join(top_asgns) %>% arrange(desc(ttl_pats)) %>% filter(!is.na(assignee_organization)) %>% slice(1:20) %>% mutate(color = "#f1c40f") %>% as.data.frame() kable(head(asng_p_dat), row.names = FALSE)
Now let's visualize the data. Each assignee will be represented by a point/bubble. The x-value of the assignee will represent the total number of patents it has published in the field of databases (on a log scale), while the y-value will represent its average normalized citation rate. The size of the bubble we be proportional to the percent of the assignee's patents that relate to databases.
# Adapted from http://jkunst.com/highcharter/showcase.html hchart( asng_p_dat, "scatter", hcaes(x = db_pats, y = mean_perc, size = frac_db_pats, group = assignee_organization, color = color) ) %>% hc_xAxis(title = list(text = "DB patents"), type = "logarithmic", allowDecimals = FALSE, endOnTick = TRUE) %>% hc_yAxis(title = list(text = "Mean cite perc.")) %>% hc_title(text = "Top assignees in 'databases'") %>% hc_add_theme(hc_theme_flatdark()) %>% hc_tooltip(useHTML = TRUE, pointFormat = tooltip_table( x = c("DB patents", "Mean cite percentile", "Fraction DB patents"), y = c("{point.db_pats:.0f}","{point.mean_perc:.2f}", "{point.frac_db_pats:.3f}") )) %>% hc_legend(enabled = FALSE)
It looks like Microsoft has relatively high values across the three metrics (average citation percentile, number of database patents, and percent of total patents that are related to databases). IBM has more patents than Microsoft, but a slightly lower average citation percentile.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.