Splits a data set into two sets with desired proportions.

1 2 |

`dataset` |
Object of class |

`prop` |
Real number between 0 and 1. Proportion of data pairs to form the training set. |

`keep.mprop` |
Logical. Whether the ratio of matches should be retained. |

`num.non` |
Positive Integer. Desired number on non-matches in the training set. |

`des.mprop` |
Real number between 0 and 1. Desired proportion of matches to non-matches in the training set. |

`use.pred` |
Logical. Whether to apply match ratio to previous classification results instead of true matching status. |

A list of `RecLinkData`

objects.

`train` |
The sampled training data. |

`valid` |
All other record pairs |

The sampled data are stored in the `pairs`

attributes of `train`

and `valid`

. If present, the attributes `prediction`

and `Wdata`

are split and the corresponding values saved. All other attributes are
copied to both data sets.

If the number of desired matches or non-matches is higher than the number actually present in the data, the maximum possible number is chosen and a warning issued.

Andreas Borg, Murat Sariyar

`genSamples`

for generating training data based on
unsupervised classification.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | ```
data(RLdata500)
pairs=compare.dedup(RLdata500, identity=identity.RLdata500,
blockfld=list(1,3,5,6,7))
# split into halves, do not enforce match ratio
l=splitData(pairs, prop=0.5)
summary(l$train)
summary(l$valid)
# split into 1/3 and 2/3, retain match ration
l=splitData(pairs, prop=1/3, keep.mprop=TRUE)
summary(l$train)
summary(l$valid)
# generate a training set with 100 non-matches and 10 matches
l=splitData(pairs, num.non=100, des.mprop=0.1, keep.mprop=TRUE)
summary(l$train)
summary(l$valid)
``` |

