predict_links: Predict links
In rijpma/capelinker: Machine Learning-based Record Linkage for Historical South Africa

predict_links

R Documentation

Predict links

Description

predict links takes a dataset of candidate links and distances, to predict links and select the best ones.

Usage

predict_links(
  dat_candidates,
  id_from,
  id_to,
  minimum_confidence = 0.5,
  modstring = c("m_rf_baptisms_full", "m_rf_baptisms_sparse"),
  linktype = c("one:one", "many:one"),
  some_measure_of_other_close_matches = "not_implemented"
)

Arguments

`id_from`	string giving the identifier variable in the candidates dataset for the observations from the `_from` dataset.
`id_to`	string giving the identifier variable in the candidates dataset for the observations from the `_to` dataset.
`minimum_confidence`	the minimum confidence level (vote share) to return a link. Defaults to 0.5.
`modstring`	String giving the name of one of the pretrained models, either m_rf_baptisms_sparse or m_rf_baptisms_full. See details below.
`dat_candidates.`	A dataset with link candidates, created using `make_candidates`, and with distances calculated using `distcalc`. `predict_links` expects this dataset to follow the naming conventions of the model you're using (detailed below).

Details

uses datatable to handle potentially large numer of candidates no need to assign to a new object if you do this, the original dataset is also modified maybe I should fix this?

The following models can be used

m_rf_baptisms_sparse a model linking parents in baptism records to marriage records, based on minimal information: male surname (mlast), male first name (mfirst), female first name (wfirst, female surname not used because it would typically not be reported in the baptism records), and year of marriage/baptism (year).
m_rf_baptisms_full a model linking parents in baptism and marriage records, using additional information: initials, profession, and soundex distances of the names. Performance is not much better than the sparse model.

Value

The candidates dataset filtered down to only the best links for each record in the original _from dataset. Also some details about one:one and many:one

rijpma/capelinker documentation built on Nov. 7, 2024, 3:06 a.m.

rijpma/capelinker index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

rijpma/capelinker
Machine Learning-based Record Linkage for Historical South Africa

predict_links: Predict links
In rijpma/capelinker: Machine Learning-based Record Linkage for Historical South Africa

Predict links

Description

Usage

Arguments

Details

Value

Related to predict_links in rijpma/capelinker...

R Package Documentation

Browse R Packages

We want your feedback!

rijpma/capelinker Machine Learning-based Record Linkage for Historical South Africa

predict_links: Predict links In rijpma/capelinker: Machine Learning-based Record Linkage for Historical South Africa

Predict links

Description

Usage

Arguments

Details

Value

Related to predict_links in rijpma/capelinker...

R Package Documentation

Browse R Packages

We want your feedback!

rijpma/capelinker
Machine Learning-based Record Linkage for Historical South Africa

predict_links: Predict links
In rijpma/capelinker: Machine Learning-based Record Linkage for Historical South Africa