Description Usage Arguments Value See Also Examples
Generate a clean data set from the imported raw data set. The data available dictates the number of columns of attributes that can be generated.
Sumobrain, Lens.org, and Google Patents have varying levels of data available.
If you import your own data, be sure to adhere to the template format, or read carefully to create your own.
1 2 3 4 5 6 | cleanPatentData(patentData = NULL, columnsExpected, cleanNames,
dateFields = NA, dateOrders, deduplicate = TRUE,
cakcDict = patentr::cakcDict,
docLengthTypesDict = patentr::docLengthTypesDict, keepType = "grant",
firstAssigneeOnly = TRUE, assigneeSep = ";",
stopWords = patentr::assigneeStopWords)
|
patentData |
The data frame of initial raw patent data. |
columnsExpected |
The expected width of the data frame, numeric. |
cleanNames |
A character vector of length columnsExpected to rename the data frame with. |
dateFields |
A character vector of the date column names which will be converted to 'Date' format. |
dateOrders |
A character string of the format required to convert string
data into 'Date' data. Sumobrain is "ymd" and lens and Google data are "mdy".
Hardcoded values include |
deduplicate |
A logical, default set to TRUE, if you want to deduplicated any patent documents that have both an app and a grant. |
cakcDict |
A county and kind code dictionary. Default is |
docLengthTypesDict |
A document length and type dictionary. Default is |
keepType |
A character variable denoting which document type to keep. Default is "grant". If NA, ignore. |
firstAssigneeOnly |
For cleaning names, use the first assignee only, default TRUE. |
assigneeSep |
The separation character if there is more than one assignee. Default is ";" semicolon. |
stopWords |
The stopword list to remove from assignee names. Default is
|
A data frame of tidy patent data.
For data formats: acars
for Sumobrain,
acarsGoogle
for Google Patents data, and acarsLens
for Lens.org data.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | sumo <- cleanPatentData(patentData = patentr::acars, columnsExpected = sumobrainColumns,
cleanNames = sumobrainNames,
dateFields = sumobrainDateFields,
dateOrders = sumobrainDateOrder,
deduplicate = TRUE,
cakcDict = patentr::cakcDict,
docLengthTypesDict = patentr::docLengthTypesDict,
keepType = "grant",
firstAssigneeOnly = TRUE,
assigneeSep = ";",
stopWords = patentr::assigneeStopWords)
# use a fresh Google export csv
# in a new csv download, however, it would not be the case
rawGoogleData <- system.file("extdata", "google_autonomous_search.csv",
package = "patentr")
rawGoogleData <- read.csv(rawGoogleData,
skip = skipGoogle, stringsAsFactors = FALSE)
rawGoogleData <- data.frame(lapply(rawGoogleData,
function(x){iconv(x, to = "ASCII")}), stringsAsFactors = FALSE)
google <- cleanPatentData(patentData = rawGoogleData, columnsExpected = googleColumns,
cleanNames = googleNames,
dateFields = googleDateFields,
dateOrders = googleDateOrder,
deduplicate = TRUE,
cakcDict = patentr::cakcDict,
docLengthTypesDict = patentr::docLengthTypesDict,
keepType = "grant",
firstAssigneeOnly = TRUE,
assigneeSep = ",",
stopWords = patentr::assigneeStopWords)
lensRawData <- system.file("extdata", "lens_autonomous_search.csv",
package = "patentr")
lensRawData <- read.csv(lensRawData, stringsAsFactors = FALSE, skip = skipLens)
lensRawData <- data.frame(lapply(lensRawData,
function(x){iconv(x, to = "ASCII")}), stringsAsFactors = FALSE)
lens <- cleanPatentData(patentData = lensRawData, columnsExpected = lensColumns,
cleanNames = lensNames,
dateFields = lensDateFields,
dateOrders = lensDateOrder,
deduplicate = TRUE,
cakcDict = patentr::cakcDict,
docLengthTypesDict = patentr::docLengthTypesDict,
keepType = "grant",
firstAssigneeOnly = TRUE,
assigneeSep = ";;",
stopWords = patentr::assigneeStopWords)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.