summarizeColumns: Summarize columns of a data frame

Description Usage Arguments Value Examples

View source: R/explorePatentData.R

Description

Summarize columns of a data frame.

Summarize a data frame df by a names character vector of header names.

Usage

1

Arguments

df

A data frame of patent data.

names

a character vector of header names that you want to summarize.

naOmit

Logical. Optionally, remove NA values at the end of the summary. Useful when comparing fields that have NA values, such as features.

Value

A dataframe of summarize values.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
sumo <- cleanPatentData(patentData = patentr::acars, columnsExpected = sumobrainColumns,
cleanNames = sumobrainNames,
dateFields = sumobrainDateFields,
dateOrders = sumobrainDateOrder,
deduplicate = TRUE,
cakcDict = patentr::cakcDict,
docLengthTypesDict = patentr::docLengthTypesDict,
keepType = "grant",
firstAssigneeOnly = TRUE, 
assigneeSep = ";",
stopWords = patentr::assigneeStopWords)

# note that in reality, you need a patent analyst to carefully score
# these patents, the score here is for demonstrational purposes
score <- round(rnorm(dim(sumo)[1],mean=1.4,sd=0.9))
score[score>3] <- 3
score[score<0] <- 0
sumo$score <- score
scoreSum <- summarizeColumns(sumo, "score")
scoreSum
# load library(ggplot2) for the below part to run
# ggplot(scoreSum, aes(x=score, y = total, fill=factor(score) )) + geom_bar(stat="identity")
nameAndScore <- summarizeColumns(sumo, c("assigneeClean","score"))
# tail(nameAndScore)

kamilien1/patentr documentation built on May 20, 2019, 7:19 a.m.