Description Usage Arguments Value See Also Examples
View source: R/cleanPatentData.R
Remove duplicate values in the patent data. Typically you will want to check if you have repeat document numbers. A document number should be a unique number in your dataset, thus, having a duplicate document number in your data set should be avoided. You can optionally specify which document type to keep.
Often times, your data sets contain duplicate patent entries. This function is
a wrapper function of the duplicated
function,
applied to a dataframe or vector.
For example, if you have the vector [US123, US123, US456], you will get the value TRUE FALSE TRUE and the duplicate value is removed.
You can go deeper with the optional variables. For many analyses, we want to exclude the second document, typically the application. This function allows you to choose which document type to keep and the rest get thrown out.
1 | removeDups(input, hasDup = NA, docType = NA, keepType = "grant")
|
input |
A vector or a data frame which you wish to remove duplicate values. When choosing a data frame, you are more selective. For example, you may want to remove a patent document only if it has the same docNum and country code. |
hasDup |
A logical vector noting if a duplicate exists. If NA, ignore. The
|
docType |
A character vector of the type of patent document (app, grant, etc.). If NA, ignore. |
keepType |
A character variable denoting which document type to keep. Default is "grant". If NA, ignore. |
A logical vector used to remove duplicate documents not fitting the one chosen. TRUE is for the document to keep.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | # simple removal: see how many rows were removed
dim(acars) - dim(acars[removeDups(acars$appNum),])
# specific removal: keep the grant docs
hasDup <- showDups(acars$appNum)
pubNum <- extractPubNumber(acars$docNum)
countryCode <- extractCountryCode(acars$docNum)
officeDocLength <- extractDocLength(countryCode = countryCode, pubNum = pubNum)
kindCode <- extractKindCode(acars$docNum)
countryAndKindCode <- paste0(countryCode, kindCode)
docType <- generateDocType(officeDocLength = officeDocLength,
countryAndKindCode = countryAndKindCode,
cakcDict = patentr::cakcDict,
docLengthTypesDict = patentr::docLengthTypesDict)
keepType <- "grant"
toKeep <- removeDups(acars$appNum, hasDup = hasDup, docType = docType, keepType = keepType)
table(toKeep)
acarsDedup <- acars[toKeep, ]
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.