lbsFindDuplicateTitles: Find documents to be merged (**EXPERIMENTAL**)

Description Usage Arguments Details Value See Also Examples

View source: R/biblio.duplicates.documents.R

Description

Indicates, by finding similarities between documents' titles, groups of documents that possibly should be merged.

Usage

1
2
lbsFindDuplicateTitles(conn, surveyDescription = NULL,
  ignoreTitles.like = NULL, aggressiveness = 1)

Arguments

conn

connection object, see lbsConnect.

surveyDescription

character string or NULL; survey description to restrict to or NULL.

ignoreTitles.like

character vector of SQL-LIKE patterns to match documents' titles to be ignored or NULL.

aggressiveness

nonnegative integer; 0 for showing only exact matches; the higher the value, the more documents will be proposed.

Details

The function determines fuzzy similarity measures of the titles. Its specificity is controlled by the aggressiveness parameter.

Search results are presented in a convenient-to-use graphical dialog box. The function tries to order the groups of documents according to their relevance (**EXPERIMENTAL** algorithm). Note that the calculation often takes a few minutes!

The ignoreTitles.like parameter determines search patterns in an SQL LIKE format, i.e. an underscore _ matches a single character and a percent sign % matches any set of characters. The search is case-insensitive.

Value

A numeric vector of user-selected documents' identifiers to be removed.

See Also

lbsDeleteDocuments, lbsFindDuplicateAuthors, lbsGetInfoDocuments

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
## Not run: 
conn <- lbsConnect("Bibliometrics.db");
## ...
listdoc <- lbsFindDuplicateTitles(conn,
   ignoreTitles.like=c("\%In this issue\%", "\%Editorial", "\%Introduction",
   "Letter to \%", "\%Preface"),
   aggressiveness=2);
lbsDeleteDocuments(conn, listdoc);
dbCommit(conn);
## ...
## End(Not run)

CITAN documentation built on May 2, 2019, 9:33 a.m.