README.md

The rMoore package is a wrapper to the euPMC package and finds publications mentioning a genome project funded by the Gordon and Betty Moore Moore Foundation. The main function citations requires a Moore project table with three columns containing valid Europe PMC search queries. An example table from GBMF521 is included in the package or you can load one using read.xls in the gdata package.

data(mg)
mg <- read.xls("Moore_521.xlsx", stringsAsFactors=FALSE)

The project table should contain a bioproject title, ID, accessions, PubMed ID of the genome publication, synonyms and three search columns.

t(mg[5,])

Project      "Algoriphagus machipongonensis PR1"                                     
ID           "PRJNA18947"                                                            
WGS          "AAXU"                                                                  
GenBank      "CM001023"                                                              
RefSeq       "NZ_CM001023"                                                           
SRA          ""                                                                      
PubMed       "21183675"                                                              
Synonyms     "Algoriphagus sp. PR1"                                                  
Investigator "Nicole King"                                                           
Notes        ""                                                                      
cites        "cites:21183675_MED"                                                    
keywords     "((Algoriphagus machipongonensis PR1) OR (Algoriphagus sp. PR1)) genome"
accs         "AAXU0* OR CM001023 OR NZ_CM001023"                                     

The cites, keywords and accs columns are usually generated automatically using the Excel functions below. If needed, these can be replaced by specific queries to narrow or broaden searches.

=IF(G6="", "", CONCATENATE("cites:", G6, "_MED"))
=IF(H6="", CONCATENATE(A6, " genome"), CONCATENATE("((", A6, ") OR (", H6, ")) genome"))
=IF(C6="",IF(D6="","",CONCATENATE(D6," OR ",E6)),IF(D6="",CONCATENATE(C6,"0*"), CONCATENATE(C6,"0* OR ",D6," OR ",E6)))

The citations functions uses the Moore table as input and searches for the genome paper, citations, keywords and accession numbers. For each project, the results are combined into a single table with an additional column containing the search criteria used to find the paper, where *=genome paper, G=cites Genome paper, K=matches Keywords, and A=mentions Accession. Mutliple projects are combined into a list of data.frames.

x <- citations(mg[5:6,])
ROW 1. Algoriphagus machipongonensis PR1
Searching EXT_ID:21183675
1 Result
Searching cites:21183675_MED
7 Results
Searching ((Algoriphagus machipongonensis PR1) OR (Algoriphagus sp. PR1)) genome
24 Results
Searching AAXU0* OR CM001023 OR NZ_CM001023
2 Results
ROW 2. alpha proteobacterium BAL199
Searching alpha proteobacterium BAL199 genome
11 Results
Searching ABHC0*
2 Results
sapply(x, nrow)
Algoriphagus machipongonensis PR1      alpha proteobacterium BAL199 
                               27                                12 
table(x[[1]]$search)
*KA   G  GK   K  KA 
  1   3   4  18   1 

This code returns all 177 project citations.

mg2 <- citations(mg)

Five additional functions work with the list of data.frames from the citations output. combine_pubs merges all tables into a single data.frame while unique_pubs groups the publications by project and includes columns listing the number and names of any cited project. An option can be used to limit the results to matches in the search column.

x1 <- combine_pubs(mg2)
#5606 rows
x2 <- unique_pubs(mg2)
#2370 rows
x4 <- unique_pubs(mg2, "A")
#174 rows

summary_pubs counts the number of citations by genome, keyword and accession. Since papers may be found by multiple searches, the total citations may be less than the sum of the last three columns.

summary_pubs(mg2, mg)[1:7,]

                            Project Citations Genome Keywords Accessions
1      Acaryochloris sp. CCMEE 5410        15      9        7          1
2      Aciduliprofundum boonei T469        27      9       18          0
3               Ahrensia sp. R2A130         5      0        5          0
4             Alcanivorax sp. DG881         6      0        6          0
5 Algoriphagus machipongonensis PR1        27      7       24          2
6      alpha proteobacterium BAL199        12      0       11          2
7     alpha proteobacterium HIMB114        43     32       12          0

This function includes a format option to output a DataTable or Excel file with links to files generated by write_pubs and citation, keyword and accession searches at Europe PMC. The genome publication may optionally be included in the tables..

summary_pubs(mg2, mg, format="DT")
summary_pubs(mg2, mg, format="Excel")

write_pubs loops through the citations output and writes DataTables to a file for each project. Use setwd to change the working directory to a location where you would like to write 177 HTML files.

setwd("path/to/DT")
write_pubs(mg2)

Finally, plot_pubs creates a xts object and plots an interactive dygraph (best viewed in RStudio). In this example, only the genome citations from 87 projects are plotted.

plot_pubs(mg2, "G")


cstubben/rMoore documentation built on May 14, 2019, 12:26 p.m.