derwent_format: Derwent Innovation Text file name matcher

Description Usage Arguments Details Value Examples

View source: R/derwent_format.R

Description

Downloads of text files in Derwent Innovation take a long form including the date and extension txt (e.g. EP1224299A220020724.txt). There is no consistent separator between the publication number and file names are not a consistent length. This prevents joining with table format downloads. This function reformats the numbers to solve this problem.

Usage

1

Arguments

x

A data frame

col

A column containing long form publication numbers

Details

Derwent Innovation downloads of patents texts in txt format use patent numbers in long form. These numbers are not a uniform length. However, the YYYYMMDD.txt chunk of the string is a uniform 12 characters. The function calculates the length of each string and deducts the last 12 characters to arrive at the publication number for matching with table data.

Value

data.frame

Examples

1
## Not run: df <- derwent_format(derwent_text_format, "doc_id")

poldham/oldhammisc documentation built on May 25, 2019, 11:23 a.m.