The package should help in various situations. For instance, you might be interested in the impact of a particular study. But not simply the quantitative impact ("times cited") but you want to know what people are actually writing about that study. This could even be your own study if you wanna know for what reasons people cite you... to be continued..
Below the steps you would normally pursue when analyzing a study's citations. We provide functions that facilitate parts of these steps. These functions should be executed in the order below.
*.pdf
files in folder to doc*.pdf
.extract_text()
to generate doc*.txt
files from pdf files.get_meta_data_doc_nv()
to identify doc*.txt
(using DOIs or titles in them) and then rename those txt files with the information you identified. The idea is that you might have a bunch of raw text files without any additional information on them.delete_refs_n_heads()
and clean_text()
to clean the text files, e.g. delete running titles etc. This is necessary because citation cases may run across pages.identify_study_doc()
to do that. But this step is not essential.extract_citation_cases()
to extract the citation cases. The results data is stored as .csv
file in the directory where you store your data in a docs
folder.clean_citation_cases()
if necessary.analyze_citations()
to analyze the .csv
files that contain the citation cases.topic_analysis()
to do the topic analysis on those .csv
files.To be continued...
By "quality of citations", we mean that citing a reference has more value than just being a numeric event; the context of a citation matters as well. We can have multiple perspectives on this context--e.g., by looking at the semantic context within the text, or a higher-level context, which is particular features of the document a reference was cited in. To get to grips with the latter, we provide the get_metadata_doc()
function, a tool that enables you to identify various features of parsed journal articles.
get_metadata_doc()
Working with get_metadata_doc()
is very simple. It takes a full text journal article as input, which has to be transformed into txt
format with the package's own extract_text()
function or other tools (e.g., Jeroen Ooms' pdftools
package).
The function imports parts of the txt file. By default, the first 2,000 lines are imported, but you can change that using the lines.import
parameter. This is basically a question of computational speed and the most important information to extract meta data tends to be written on the first few pages. After some cleaning work, the function looks for a DOI (digital object identifier) that is linked to the paper. If successful, it uses the identifier to run a query to the Crossref API using the rcrossref
package. If no DOI is found, it nevertheless runs a query but uses the first 20 lines of the document as input for the query string. If successful, the API returns a lot of information on the article, which is stored in a data.frame object. Optionally, you can also download the article reference in bibtex format using bibtex = TRUE
.
Once the function has identified the corresponding journal, it also taps a journal rankings data base from SJR (SCImago Journal & Country Rank), matches it with the journal and appends information on various journal-level citation statistics.
Finally, you can decide which variables to store in the output data.frame. An overview of all available variables can be found in the documentation; see ?get_metadata_doc
.
setwd("C:\\Users\\Paul\\GDrive\\1-Research\\2017_Quality_of_citations\\data") ### Set folder ### folder <- "docs" setnumber <- 200 get_metadata_doc_nv(folder, number = setnumber)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.