Larger simulated dataset drawn from the same distribution as FI_test and FI_true and used to train the imputation algorithm. 5% of the values are missing. Used with TrainFastImputation.

1 |

A data frame with 9 variables and 10000 observations.

`user_id_1`

Sequential user ids

`bounded_below_2`

Multivariate normal, transformed using

`exp(x)`

`unbounded_3`

Multivariate normal

`unbounded_4`

Multivariate normal

`bounded_above_5`

Multivariate normal, transformed using

`-exp(x)`

`bounded_above_and_below_6`

Multivariate normal, transformed using

`pnorm(x)`

`unbounded_7`

Multivariate normal

`unbounded_8`

Multivariate normal

`categorical_9`

"A" if the first of three multivariate normal draws is greatest; "B" if the second is greatest; "C" if the third is greatest

Stephen R. Haptonstahl srh@haptonstahl.org

All columns start as multivariate normal draws. Columns 2, 5, and 6 are transformed. Column 9 is the result of three multivariate normal columns being interpreted as one-hot encoding of a three-valued categorical variable.

