title: "LIBER Analysis" author: "Leo Lahti" date: "05/03/2016" output: markdown_document
This document provides a reproducible summary of the ESTC history data set (roughly 50,000 documents) used in Lahti, Ilomaki, Tolonen (2015); Liber Quarterly 25(2), pp.87–116. For details on the data and analysis, see the manuscript. The figures presented on this page are not identical to the original article due to improvements in the analysis pipeline after the article publication (we have checked that qualitative results remain the same). The exact original figures and code can be retrieved upon request.
For further analysis of the overall ESTC data collection (roughly 500,000 documents) see links in the README file.
The analysis on this page rely on access to the ESTC data. This data set was obtained from the British Library and is not public. Assuming you have access to the ESTC data, you can reproduce the analyses by first cloning this repository and parsing the raw data file (instructions). Full details for reproducing the figures are in the Rmarkdown source code of this document that you are reading now. After parsing the raw data file and installing the required R packages, you can run the following commands in R (use the inst/examples folder of this repository as the working directory) to generate all the figures:
library(knitr)
knit("20151023-LIBER.Rmd")
Specific authors are highlighted:
The visualization also reveals ambiguities arising from authors having the same name but living at different times (e.g. David Hume)
William Prynne, Daniel Defoe and David Hume (highlighted in Figures 1 and 2) provide an overview of their publishing activity up until 1800.
The visualization reveals the nature of the author’s publications, distinguishing pamphleteering (many titles, few pages) and the authoring of books (fewer titles, more pages).
The UK map was generated by taking a screencapture of a video produced by running the analysis code:
source("20151023-LIBER-video.R")
The circle diameter corresponds to the logarithm (log10) of the title count. You can also download and view the full video.
The current country of origin is indicated.
Estimated gatherings per country:
Estimated pagecounts per country:
Gatherings vs. pagecounts per country
A comparison between the title count for history publications and for all documents in the ESTC catalogue, 1470-1800.
The publishing of historical works in Edinburgh on a timeline highlighting the eras of the English Civil War (1642-1651), the Restoration (1660), the Glorious Revolution (1688-1689), the Union Debates (1705-1706) and American Independence (1776).
This document was created with the following versions:
sessionInfo()
## R version 3.3.1 (2016-06-21)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.10
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] gdata_2.17.0 ggthemes_3.2.0 ggmap_2.6.1
## [4] rmarkdown_1.2.9000 reshape_0.8.6 microbiome_0.99.88
## [7] phyloseq_1.16.2 knitr_1.15.1 gridExtra_2.2.1
## [10] reshape2_1.4.2 tidyr_0.6.0 ggplot2_2.2.0
## [13] magrittr_1.5 estc_0.1.45 stringr_1.1.0
## [16] bibliographica_0.2.31 sorvi_0.7.26 dplyr_0.5.0
## [19] devtools_1.12.0
##
## loaded via a namespace (and not attached):
## [1] colorspace_1.3-0 rjson_0.2.15 babynames_0.2.1
## [4] dynamicTreeCut_1.63-1 rprojroot_1.1 htmlTable_1.7
## [7] XVector_0.12.1 AnnotationDbi_1.34.4 codetools_0.2-15
## [10] splines_3.3.1 doParallel_1.0.10 impute_1.46.0
## [13] robustbase_0.92-6 tgp_2.4-14 ade4_1.7-4
## [16] Formula_1.2-1 jsonlite_1.1 Cairo_1.5-9
## [19] WGCNA_1.51 cluster_2.0.5 GO.db_3.3.0
## [22] png_0.1-7 mapproj_1.2-4 backports_1.0.4
## [25] assertthat_0.1 Matrix_1.2-7.1 lazyeval_0.2.0
## [28] acepack_1.4.1 htmltools_0.3.5 tools_3.3.1
## [31] igraph_1.0.1 NLP_0.1-9 gtable_0.2.0
## [34] maps_3.1.1 Rcpp_0.12.8 slam_0.1-38
## [37] Biobase_2.32.0 Biostrings_2.40.2 multtest_2.28.0
## [40] ape_3.5 preprocessCore_1.34.0 nlme_3.1-128
## [43] iterators_1.0.8 tensorA_0.36 fastcluster_1.1.21
## [46] gender_0.5.1 proto_1.0.0 gtools_3.5.0
## [49] stringdist_0.9.4.2 DEoptimR_1.0-6 zlibbioc_1.18.0
## [52] MASS_7.3-45 scales_0.4.1 parallel_3.3.1
## [55] biomformat_1.0.2 genderdata_0.5.0 rhdf5_2.16.0
## [58] RColorBrewer_1.1-2 yaml_2.1.14 memoise_1.0.0
## [61] geosphere_1.5-5 rpart_4.1-10 latticeExtra_0.6-28
## [64] stringi_1.1.3 maptree_1.4-7 RSQLite_1.0.0
## [67] highr_0.6 S4Vectors_0.10.3 foreach_1.4.3
## [70] energy_1.7-0 permute_0.9-4 BiocGenerics_0.18.0
## [73] boot_1.3-18 chron_2.3-47 RgoogleMaps_1.4.1
## [76] compositions_1.40-1 moments_0.14 matrixStats_0.51.0
## [79] evaluate_0.10 lattice_0.20-34 labeling_0.3
## [82] plyr_1.8.4 R6_2.2.0 IRanges_2.6.1
## [85] Hmisc_4.0-0 DBI_0.5-1 foreign_0.8-67
## [88] withr_1.0.2 mgcv_1.8-16 survival_2.40-1
## [91] sp_1.2-3 nnet_7.3-12 tibble_1.2
## [94] bayesm_3.0-2 jpeg_0.1-8 grid_3.3.1
## [97] data.table_1.9.6 vegan_2.4-1 digest_0.6.10
## [100] tm_0.6-2 stats4_3.3.1 munsell_0.4.3
## [103] tcltk_3.3.1
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.