Description Usage Arguments Value See Also Examples
View source: R/cleanPatentData.R
Determine the type of document from the patent publication data.
Often times, data exports from publicly available sources do not provide the type of patent document, or, if provided, still requires standardization. By using the kind code, country code, and pre-developed dictionaries for doc length and country code, you can get a great approximation of the types of documents.
Note that you can use View(lens[lens$docType=="NA",]) to view the not-found
document types. Often times, these are small countries. You can add to the
cakcDict
to fix these. They are also useful to ignore if you
only want to focus on the larger countries, which are all covered.
1 2 3 | generateDocType(officeDocLength, countryAndKindCode,
cakcDict = patentr::cakcDict,
docLengthTypesDict = patentr::docLengthTypesDict)
|
officeDocLength |
The concat value of country code and number of numerical digits.
Extracted using the |
countryAndKindCode |
The concat value of the country code and kind code.
Extracted using the |
cakcDict |
A county and kind code dictionary. Default is |
docLengthTypesDict |
A document length and type dictionary. Default is |
A vector of characters labeling the document type, with NA for when no match was found.
1 2 3 4 5 6 7 8 9 10 11 12 13 | acars <- acars
acars$pubNum <- extractPubNumber(acars$docNum) # pubnum, ex ####
acars$countryCode <- extractCountryCode(acars$docNum) # country code, ex USAPP, USD
acars$officeDocLength <- extractDocLength(countryCode = acars$countryCode,
pubNum = acars$pubNum) # cc + pub num length concat
acars$kindCode <- extractKindCode(acars$docNum)
acars$countryAndKindCode <- with(acars, paste0(countryCode, kindCode))
acars$docType <- generateDocType(officeDocLength = acars$officeDocLength,
countryAndKindCode = acars$countryAndKindCode,
cakcDict = cakcDict,
docLengthTypesDict = docLengthTypesDict)
table(acars$docType)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.