predict_links | R Documentation |
predict links
takes a dataset of candidate links and distances, to predict links and select the best ones.
predict_links(
dat_candidates,
id_from,
id_to,
minimum_confidence = 0.5,
modstring = c("m_rf_baptisms_full", "m_rf_baptisms_sparse"),
linktype = c("one:one", "many:one"),
some_measure_of_other_close_matches = "not_implemented"
)
id_from |
string giving the identifier variable in the candidates dataset for the observations from the |
id_to |
string giving the identifier variable in the candidates dataset for the observations from the |
minimum_confidence |
the minimum confidence level (vote share) to return a link. Defaults to 0.5. |
modstring |
String giving the name of one of the pretrained models, either m_rf_baptisms_sparse or m_rf_baptisms_full. See details below. |
dat_candidates. |
A dataset with link candidates, created using |
uses datatable to handle potentially large numer of candidates no need to assign to a new object if you do this, the original dataset is also modified maybe I should fix this?
The following models can be used
m_rf_baptisms_sparse
a model linking parents in baptism records to marriage records, based on minimal information: male surname (mlast), male first name (mfirst), female first name (wfirst, female surname not used because it would typically not be reported in the baptism records), and year of marriage/baptism (year).
m_rf_baptisms_full
a model linking parents in baptism and marriage records, using additional information: initials, profession, and soundex distances of the names. Performance is not much better than the sparse model.
The candidates dataset filtered down to only the best links for each record in the original _from
dataset.
Also some details about one:one and many:one
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.