knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(rebib)
The magic behind rebib is a parser which parses bibliographic data and assorts them according to the matching regular expressions.
This stage is a minor step where it reads the Embedded Bibliography from the LaTeX document. This step also includes Filtering out the commented code to avoid un-intended entries read.
Lastly, the data is broken down based on the LaTeX macro \\bibitem
as a marker for a new entry and this assorted data is exported to a variable.
dir.create(your_article_folder <- file.path(tempdir(), "exampledir")) example_files <- system.file("article", package = "rebib") x <- file.copy(from = example_files,to=your_article_folder,recursive = T) your_article_path <- paste(your_article_folder,"article",sep="/")
file_name <- rebib:::get_texfile_name(your_article_path) bib_items <- rebib:::extract_embeded_bib_items(your_article_path,file_name) bib_items[[1]] bib_items[[2]]
Now, with the chunks of bibliographic entries, each is passed to a parser which will break it down based on regular expressions.
The logic is to use the LaTeX macro \\newblock
as a placeholder to identify the position of text elements relative to it.
The first value to be parsed is the unique_id
also called the citation reference which is used to cite elements inside the article. Usually, this is in the first or second line of the whole entry. The position of the
unique_id
will determine the position of the author names.
bib_items[[1]][1]
After reading the unique_id
, the parser will attempt to read the author name(s) ~~up to two lines long~~ (Usually this is the case in most articles).
bib_items[[1]][2]
Next, the title is extracted based on the position of the new blocks or the end of the bib chunk.
bib_items[[1]][3]
This way the crucial elements of the bibliographic entry (unique_id, author names and title ) are parsed out.
The remaining data is stored as journal
internally and publisher
when writing to a new BibTeX file.
bib_items[[1]][4:6]
A series of filters for ISBN, URL, pages and year fields are applied to search for relevant data from the remaining data. If the data is not available then it is set as NULL and will be skipped while writing the BibTeX file. There is a lot of filtering of common LaTeX elements and after that, the data remaining is stored in a structured format to be written to a file.
bib_entry <- rebib:::bib_handler(bib_items) bib_entry
After reading the bibliographic entries and splitting out meaningful values from them, we can finally write a structured file in the BibTeX format.
The writer will read the bib chunks one at a time based on the metadata extracted and will write the corresponding data fields. The default entry type is a book, which should not have any problems with the web articles.
file_path <- paste0(your_article_path,"/",file_name) bib_path <- paste0(your_article_path,"/example.bib") x <- file.remove(bib_path)
rebib:::bibtex_writer(bib_entry,file_path) cat(readLines(paste(your_article_path,"example.bib",sep="/")),sep = "\n")
I expect the authors who are converting the document to take a look and check if there are any errors or missing values.
unlink(your_article_folder)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.