In rOpenGov/bibliographica: Bibliographic Data Analysis

Publication places

r length(unique(str_trim(unlist(strsplit(as.character(df$publication_place), ";"))))) unique publication places; available for r sum(!is.na(df$publication_place)) documents (r round(100*mean(!is.na(df$publication_place)))%).
r length(readLines(paste0(this.folder, "/output.tables/publication_place_ambiguous.csv"))) - 1 ambiguous publication places; some of these can be possibly resolved by checking that the the synonyme list does not contain multiple versions of the final name (case sensitive).
r try(length(readLines(paste0(this.folder, "/output.tables/publication_place_todo.csv")))) unknown place names These terms do not map to any known place on the synonyme list; either because they require further cleaning or have not yet been encountered in the analyses. Terms that are clearly not place names can be added to stopwords; borderline cases that are not accepted as place names can be added as NA on the synonyme list.
r length(readLines(paste0(this.folder, "/output.tables/publication_place_discarded.csv"))) - 1 discarded place names These terms are potential place names but with a closer check have been explicitly rejected on the synonyme list
Conversions from the original to the accepted place names
Unit tests for place names are automatically checked during package build

Top-r ntop publication places are shown together with the number of documents.

p <- top_plot(df, "publication_place", ntop)
p <- p + ggtitle(paste("Top publication places"))
p <- p + scale_y_log10()
p <- p + ylab("Title count")
print(p)

p <- top_plot(df, "publication_country", ntop)
p <- p + ggtitle(paste("Top publication countries"))
p <- p + scale_y_log10()
p <- p + ylab("Title count")
print(p)

Publication countries

r length(unique(df$publication_country)) unique publication countries; available for r sum(!is.na(df$publication_country)) documents (r round(100*mean(!is.na(df$publication_country)))%).
r length(na.omit(unique(subset(df, is.na(publication_country))$publication_place))) places with unknown publication country (r round(100 * length(na.omit(unique(subset(df, is.na(publication_country))$publication_place))) / length(na.omit(unique(df$publication_place))), 1)% of the unique places; can be added to country mappings)
r length(readLines(paste0(this.folder, "/output.tables/publication_country_ambiguous.csv"))) - 1 potentially ambiguous region-country mappings (these may occur in the data in various synonymes and the country is not always clear when multiple countries have a similar place name; the default country is listed first). NOTE: possible improvements should not be done in this output summary but instead in the country mapping file.

tab <- top(df, "publication_country", output = "data.frame", round = 1)
colnames(tab) <- c("Country", "Documents (n)", "Fraction (%)")
library(knitr)
kable(head(tab))

Geocoordinates

r round(100*mean(!is.na(df$latitude) & !is.na(df$longitude)), 1)% of the documents were matched to geographic coordinates (based on Geonames).
r length(unique(df$publication_place[(is.na(df$latitude))])) unique places (r round(100 * length(unique(df$publication_place[(is.na(df$latitude))]))/length(unique(df$publication_place)), 1)% of all unique places and r round(100 * mean(is.na(df$latitude) | is.na(df$longitude)), 2)% of all documents) are missing geocoordinates. See list of places missing geocoordinate information.

Publication geography

r length(unique(df$publication_geography_country)) unique countries on geographical region considered in the publication; available for r sum(!is.na(df$publication_geography_country)) documents (r round(100*mean(!is.na(df$publication_geography_country)))%).
r length(unique(df$publication_geography_place)) unique places on geographical region considered in the publication; available for r sum(!is.na(df$publication_geography_place)) documents (r round(100*mean(!is.na(df$publication_geography_place)))%).