View source: R/apply_citation_matching.R
| applyCitationMatching | R Documentation |
This is a convenience wrapper function that applies normalize_citations
to a bibliometrix data frame (typically loaded with convert2df). It
extracts citations from the CR field, performs normalization and matching, and
returns comprehensive results including per-paper citation lists and summary statistics.
applyCitationMatching(M, threshold = 0.9, method = "jw", min_chars = 20)
M |
A bibliometrix data frame, typically created by
|
threshold |
Numeric value between 0 and 1 indicating the similarity threshold
for matching citations. Default is 0.85. See |
method |
String distance method to use for fuzzy matching. Options include:
|
min_chars |
Minimum characters for valid citations (default: 20) |
The function automatically handles the new Scopus citation format (where the year appears at the end in parentheses) by converting it to the classic format before processing.
The function performs the following steps:
Splits the CR field by semicolons to extract individual citations
Detects and converts new Scopus format citations to classic format
Trims whitespace from each citation
Applies normalize_citations to identify duplicate citations
Links normalized citations back to source documents (SR)
Generates summary statistics and reconstructs normalized CR fields
The normalized CR field can be used to replace the original CR field in subsequent bibliometric analyses, ensuring that citation counts and network analyses are not inflated by duplicate citations with minor formatting differences.
A list with four elements:
A data frame with columns:
SR: Source document identifier
CR: Original citation string
CR_canonical: Canonical (normalized) citation
cluster_id: Unique cluster identifier
n_cluster: Size of the citation cluster
first_author, year, journal, volume: Extracted metadata
A data frame summarizing citation frequencies with columns:
CR_canonical: The canonical citation for each cluster
n: Total number of times this work was cited
n_variants: Number of different formatting variants found
variants_example: Sample of variant formats (up to 3 examples)
Sorted by citation frequency (n) in descending order.
Complete output from normalize_citations,
useful for detailed analysis of the matching process.
A data frame with columns:
SR: Source document identifier
CR: Reconstructed CR field with normalized citations (semicolon-separated)
n_references: Number of unique references after normalization
This can be merged back with M to replace the original CR field.
Aria, M. & Cuccurullo, C. (2017). bibliometrix: An R-tool for comprehensive science mapping analysis. Journal of Informetrics, 11(4), 959-975.
normalize_citations for the underlying normalization algorithm
citations for citation analysis
localCitations for local citation analysis
## Not run:
# Load bibliometric data
file <- "https://www.bibliometrix.org/datasets/savedrecs.txt"
M <- convert2df(file, dbsource = "wos", format = "plaintext")
# Apply citation normalization
results <- applyCitationMatching(M, threshold = 0.85)
# View top cited works (after normalization)
head(results$summary, 20)
# See how many variants were found for the top citation
top_citation <- results$summary$CR_canonical[1]
variants <- subset(results$full_data, CR_canonical == top_citation)
unique(variants$CR)
# Replace original CR with normalized CR in the data frame
M_normalized <- M %>%
rename(CR_orig = CR) %>%
left_join(results$CR_normalized, by = "SR")
# Compare citation counts before and after normalization
original_citations <- strsplit(M$CR, ";") %>%
unlist() %>%
trimws() %>%
table() %>%
length()
normalized_citations <- nrow(results$summary)
cat("Original unique citations:", original_citations, "\n")
cat("After normalization:", normalized_citations, "\n")
cat("Duplicates found:", original_citations - normalized_citations, "\n")
# Use normalized data for further analysis
CR_analysis <- citations(M_normalized, field = "article", sep = ";")
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.