label: Manually label data

View source: R/label.R

labelR Documentation

Manually label data

Description

label wraps a sampling and the candidates function to make manual labelling of training data easier

Usage

label(
  dat_from,
  dat_to,
  persid_from,
  persid_to,
  blockvariable,
  blocktype,
  N,
  path,
  ...
)

Arguments

dat_from

a data.table

dat_to

a data.table

persid_from

string identifying the person id variable

persid_to

string identifying the person id variable

blockvariable

string identifying the blocking variable

N

the number of unique observations of the blocking varaible to be labelled, defaults to 500

...

passed to candidates for customised blocking

Details

label takes a random sample from dat_from, gathers candidates from dat_to and presents them to the user to select the match or tell that there is no match

The labelling session is interactive, and the user is presented with a choice between

  • PersidOne of the numbers of persid_to

  • None

At some point a "Back" option might be added

After selecting there is an annotation step, that can be done

  • Cancel

  • Sure

  • Maybe

  • Doubtful

  • Ambiguous

Value

A list containing candidate pairs to be labelled

Examples

d1 = data.table::data.table(mlast = c("jong", "smid"), mfirst = c("Jan", "Jan"), wfirst = NA, wlast = NA, settlerchildren = NA, persid = c(1:2))
d2 = data.table::data.table(mlast = c("jongh", "jong", "smit"), mfirst = c("Jan", "Dirk", "Johan"), wlast = NA, wfirst = NA, settlerchildren = NA, persid = c(1:3))
label(d1, d2, "persid", "persid", "mlast", "bigram distance", 2)


rijpma/capelinker documentation built on Nov. 7, 2024, 3:06 a.m.