predict_links: Predict links

View source: R/pred.R

predict_linksR Documentation

Description

predict links takes a dataset of candidate links and distances, to predict links and select the best ones.

Usage

predict_links(
  dat_candidates,
  id_from,
  id_to,
  minimum_confidence = 0.5,
  modstring = c("m_rf_baptisms_full", "m_rf_baptisms_sparse"),
  linktype = c("one:one", "many:one"),
  some_measure_of_other_close_matches = "not_implemented"
)

Arguments

id_from

string giving the identifier variable in the candidates dataset for the observations from the _from dataset.

id_to

string giving the identifier variable in the candidates dataset for the observations from the _to dataset.

minimum_confidence

the minimum confidence level (vote share) to return a link. Defaults to 0.5.

modstring

String giving the name of one of the pretrained models, either m_rf_baptisms_sparse or m_rf_baptisms_full. See details below.

dat_candidates.

A dataset with link candidates, created using make_candidates, and with distances calculated using distcalc. predict_links expects this dataset to follow the naming conventions of the model you're using (detailed below).

Details

uses datatable to handle potentially large numer of candidates no need to assign to a new object if you do this, the original dataset is also modified maybe I should fix this?

The following models can be used

  • m_rf_baptisms_sparse a model linking parents in baptism records to marriage records, based on minimal information: male surname (mlast), male first name (mfirst), female first name (wfirst, female surname not used because it would typically not be reported in the baptism records), and year of marriage/baptism (year).

  • m_rf_baptisms_full a model linking parents in baptism and marriage records, using additional information: initials, profession, and soundex distances of the names. Performance is not much better than the sparse model.

Value

The candidates dataset filtered down to only the best links for each record in the original _from dataset. Also some details about one:one and many:one


rijpma/capelinker documentation built on Nov. 7, 2024, 3:06 a.m.