r length(unique(str_trim(unlist(strsplit(as.character(df$publication_place), ";")))))
unique publication places; available for r sum(!is.na(df$publication_place))
documents (r round(100*mean(!is.na(df$publication_place)))
%).r length(readLines(paste0(this.folder, "/output.tables/publication_place_ambiguous.csv"))) - 1
ambiguous publication places; some of these can be possibly resolved by checking that the the synonyme list does not contain multiple versions of the final name (case sensitive). r try(length(readLines(paste0(this.folder, "/output.tables/publication_place_todo.csv"))))
unknown place names These terms do not map to any known place on the synonyme list; either because they require further cleaning or have not yet been encountered in the analyses. Terms that are clearly not place names can be added to stopwords; borderline cases that are not accepted as place names can be added as NA on the synonyme list.r length(readLines(paste0(this.folder, "/output.tables/publication_place_discarded.csv"))) - 1
discarded place names These terms are potential place names but with a closer check have been explicitly rejected on the synonyme listTop-r ntop
publication places are shown together with the number of documents.
p <- top_plot(df, "publication_place", ntop) p <- p + ggtitle(paste("Top publication places")) p <- p + scale_y_log10() p <- p + ylab("Title count") print(p) p <- top_plot(df, "publication_country", ntop) p <- p + ggtitle(paste("Top publication countries")) p <- p + scale_y_log10() p <- p + ylab("Title count") print(p)
r length(unique(df$publication_country))
unique publication countries; available for r sum(!is.na(df$publication_country))
documents (r round(100*mean(!is.na(df$publication_country)))
%).r length(na.omit(unique(subset(df, is.na(publication_country))$publication_place)))
places with unknown publication country (r round(100 * length(na.omit(unique(subset(df, is.na(publication_country))$publication_place))) / length(na.omit(unique(df$publication_place))), 1)
% of the unique places; can be added to country mappings)r length(readLines(paste0(this.folder, "/output.tables/publication_country_ambiguous.csv"))) - 1
potentially ambiguous region-country mappings (these may occur in the data in various synonymes and the country is not always clear when multiple countries have a similar place name; the default country is listed first). NOTE: possible improvements should not be done in this output summary but instead in the country mapping file.tab <- top(df, "publication_country", output = "data.frame", round = 1) colnames(tab) <- c("Country", "Documents (n)", "Fraction (%)") library(knitr) kable(head(tab))
r round(100*mean(!is.na(df$latitude) & !is.na(df$longitude)), 1)
% of the documents were matched to geographic coordinates (based on Geonames).r length(unique(df$publication_place[(is.na(df$latitude))]))
unique places (r round(100 * length(unique(df$publication_place[(is.na(df$latitude))]))/length(unique(df$publication_place)), 1)
% of all unique places and r round(100 * mean(is.na(df$latitude) | is.na(df$longitude)), 2)
% of all documents) are missing geocoordinates. See list of places missing geocoordinate information.r length(unique(df$publication_geography_country))
unique countries on geographical region considered in the publication; available for r sum(!is.na(df$publication_geography_country))
documents (r round(100*mean(!is.na(df$publication_geography_country)))
%).r length(unique(df$publication_geography_place))
unique places on geographical region considered in the publication; available for r sum(!is.na(df$publication_geography_place))
documents (r round(100*mean(!is.na(df$publication_geography_place)))
%).Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.