inst/examples/20151023-LIBER.md

title: "LIBER Analysis" author: "Leo Lahti" date: "05/03/2016" output: markdown_document

LIBER Analyses

This document provides a reproducible summary of the ESTC history data set (roughly 50,000 documents) used in Lahti, Ilomaki, Tolonen (2015); Liber Quarterly 25(2), pp.87–116. For details on the data and analysis, see the manuscript. The figures presented on this page are not identical to the original article due to improvements in the analysis pipeline after the article publication (we have checked that qualitative results remain the same). The exact original figures and code can be retrieved upon request.

For further analysis of the overall ESTC data collection (roughly 500,000 documents) see links in the README file.

Reproducing the analyses

The analysis on this page rely on access to the ESTC data. This data set was obtained from the British Library and is not public. Assuming you have access to the ESTC data, you can reproduce the analyses by first cloning this repository and parsing the raw data file (instructions). Full details for reproducing the figures are in the Rmarkdown source code of this document that you are reading now. After parsing the raw data file and installing the required R packages, you can run the following commands in R (use the inst/examples folder of this repository as the working directory) to generate all the figures:

library(knitr)
knit("20151023-LIBER.Rmd")

Who wrote history ?

Authors who published the most history titles according to the ESTC

Specific authors are highlighted:

plot of chunk 20151023LIBER-1

The life spans of the top authors based on the title count

The visualization also reveals ambiguities arising from authors having the same name but living at different times (e.g. David Hume)

plot of chunk 20151023LIBER-2

The title counts per year for selected authors

William Prynne, Daniel Defoe and David Hume (highlighted in Figures 1 and 2) provide an overview of their publishing activity up until 1800.

plot of chunk 20151023LIBER-3

Title count versus paper consumption among the highlighted authors

The visualization reveals the nature of the author’s publications, distinguishing pamphleteering (many titles, few pages) and the authoring of books (fewer titles, more pages).

plot of chunk 20151023LIBER-4

Where was history published ?

Publication volumes at the top publication locations in Britain and Ireland, 1470-1800

The UK map was generated by taking a screencapture of a video produced by running the analysis code:

source("20151023-LIBER-video.R")

The circle diameter corresponds to the logarithm (log10) of the title count. You can also download and view the full video.

UK1700

The top publication places ranked by the title count

plot of chunk 20151023LIBER-topplace

Title count and overall paper consumption in the top publication locations

The current country of origin is indicated.

plot of chunk 20151023LIBER-8

Title count and paper consumption in Ireland, Scotland and the USA

plot of chunk 20150611paris-places4

Estimated gatherings per country:

plot of chunk 20151023LIBER-docsizecomp1

Estimated pagecounts per country:

plot of chunk 20151023LIBER-docsizecomp2

Gatherings vs. pagecounts per country

plot of chunk 20151023LIBER-docsizecomp3

How does publishing change ?

Publishing activity among all ESTC documents (balls) and History documents (triangles)

A comparison between the title count for history publications and for all documents in the ESTC catalogue, 1470-1800.

plot of chunk 20151023LIBER-14

plot of chunk 20151023LIBER-10a

plot of chunk 20151023LIBER-10b

Title count between the octavo versus the folio format among the top authors

plot of chunk 20151023LIBER-3b

Edinburgh publishing

The publishing of historical works in Edinburgh on a timeline highlighting the eras of the English Civil War (1642-1651), the Restoration (1660), the Glorious Revolution (1688-1689), the Union Debates (1705-1706) and American Independence (1776).

plot of chunk 20151023LIBER-Edinburgh

Session info

This document was created with the following versions:

sessionInfo()
## R version 3.3.1 (2016-06-21)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.10
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] gdata_2.17.0          ggthemes_3.2.0        ggmap_2.6.1          
##  [4] rmarkdown_1.2.9000    reshape_0.8.6         microbiome_0.99.88   
##  [7] phyloseq_1.16.2       knitr_1.15.1          gridExtra_2.2.1      
## [10] reshape2_1.4.2        tidyr_0.6.0           ggplot2_2.2.0        
## [13] magrittr_1.5          estc_0.1.45           stringr_1.1.0        
## [16] bibliographica_0.2.31 sorvi_0.7.26          dplyr_0.5.0          
## [19] devtools_1.12.0      
## 
## loaded via a namespace (and not attached):
##   [1] colorspace_1.3-0      rjson_0.2.15          babynames_0.2.1      
##   [4] dynamicTreeCut_1.63-1 rprojroot_1.1         htmlTable_1.7        
##   [7] XVector_0.12.1        AnnotationDbi_1.34.4  codetools_0.2-15     
##  [10] splines_3.3.1         doParallel_1.0.10     impute_1.46.0        
##  [13] robustbase_0.92-6     tgp_2.4-14            ade4_1.7-4           
##  [16] Formula_1.2-1         jsonlite_1.1          Cairo_1.5-9          
##  [19] WGCNA_1.51            cluster_2.0.5         GO.db_3.3.0          
##  [22] png_0.1-7             mapproj_1.2-4         backports_1.0.4      
##  [25] assertthat_0.1        Matrix_1.2-7.1        lazyeval_0.2.0       
##  [28] acepack_1.4.1         htmltools_0.3.5       tools_3.3.1          
##  [31] igraph_1.0.1          NLP_0.1-9             gtable_0.2.0         
##  [34] maps_3.1.1            Rcpp_0.12.8           slam_0.1-38          
##  [37] Biobase_2.32.0        Biostrings_2.40.2     multtest_2.28.0      
##  [40] ape_3.5               preprocessCore_1.34.0 nlme_3.1-128         
##  [43] iterators_1.0.8       tensorA_0.36          fastcluster_1.1.21   
##  [46] gender_0.5.1          proto_1.0.0           gtools_3.5.0         
##  [49] stringdist_0.9.4.2    DEoptimR_1.0-6        zlibbioc_1.18.0      
##  [52] MASS_7.3-45           scales_0.4.1          parallel_3.3.1       
##  [55] biomformat_1.0.2      genderdata_0.5.0      rhdf5_2.16.0         
##  [58] RColorBrewer_1.1-2    yaml_2.1.14           memoise_1.0.0        
##  [61] geosphere_1.5-5       rpart_4.1-10          latticeExtra_0.6-28  
##  [64] stringi_1.1.3         maptree_1.4-7         RSQLite_1.0.0        
##  [67] highr_0.6             S4Vectors_0.10.3      foreach_1.4.3        
##  [70] energy_1.7-0          permute_0.9-4         BiocGenerics_0.18.0  
##  [73] boot_1.3-18           chron_2.3-47          RgoogleMaps_1.4.1    
##  [76] compositions_1.40-1   moments_0.14          matrixStats_0.51.0   
##  [79] evaluate_0.10         lattice_0.20-34       labeling_0.3         
##  [82] plyr_1.8.4            R6_2.2.0              IRanges_2.6.1        
##  [85] Hmisc_4.0-0           DBI_0.5-1             foreign_0.8-67       
##  [88] withr_1.0.2           mgcv_1.8-16           survival_2.40-1      
##  [91] sp_1.2-3              nnet_7.3-12           tibble_1.2           
##  [94] bayesm_3.0-2          jpeg_0.1-8            grid_3.3.1           
##  [97] data.table_1.9.6      vegan_2.4-1           digest_0.6.10        
## [100] tm_0.6-2              stats4_3.3.1          munsell_0.4.3        
## [103] tcltk_3.3.1


COMHIS/estc documentation built on April 7, 2022, 4:53 p.m.