# Generate simulated mixed-type data with cluster structure.

### Description

This function simulates mixed-type data sets with a latent cluster structure, with continuous and nominal variables.

### Usage

1 2 | ```
genMixedData(sampSize, nConVar, nCatVar, nCatLevels, nConWithErr, nCatWithErr,
popProportions, conErrLev, catErrLev)
``` |

### Arguments

`sampSize` |
Integer: Size of the simulated data set. |

`nConVar` |
The number of continuous variables. |

`nCatVar` |
The number of categorical variables. |

`nCatLevels` |
Integer: The number of categories per categorical variables. Currently must be a multiple of the number of populations specified in popProportions. |

`nConWithErr` |
Integer: The number of continuous variables with error. |

`nCatWithErr` |
Integer: The number of categorical variables with error. |

`popProportions` |
A vector of scalars that sums to one. The length gives the number of populations (clusters), with values denoting the prior probability of observing a member of the corresponding population. NOTE: currently only two populations are supported. |

`conErrLev` |
A scalar between 0.01 and 1 denoting the univariate overlap between clusters on the continuous variables specified to have error. |

`catErrLev` |
Univariate overlap level for the categorical variables with error. |

### Details

This function simulates mixed-type data sets with a latent cluster structure. Continuous variables follow a normal mixture model, and categorical variables follow a multinomial mixture model. Overlap of the continuous and categorical variables (i.e. how clear the cluster structure is) can be manipulated by the user. The default overlap level is 1 percent (i.e. almost perfect separation), and a user-specified number of continuous and categorical variables can be specified to be measured with error, in which case the overlap can be selectively set to be anywhere within 1 and 100 percent (100 percent corresponds to complete overlap).

NOTE: Currently, only two populations (clusters) are supported.

### Value

A list with the following elements:

`trueID` |
Integer vector giving population (cluster) membership of each observation |

`trueMus` |
Mean parameters used for population (cluster) centers in the continuous variables |

`conVars` |
The continuous variables |

`errVariance` |
Variance parameter used for continuous error distribution |

`popProbsNoErr` |
Multinomial probability vectors for categorical variables without measurement error |

`popProbsWithErr` |
Multinomial probability vectors for categorical variables with measurement error |

`catVars` |
The categorical variables |

### Examples

1 2 3 4 | ```
dat <- genMixedData(100, 2, 2, nCatLevels=4, nConWithErr=1, nCatWithErr=1,
popProportions=c(0.3,0.7), conErrLev=0.3, catErrLev=0.2)
with(dat,plot(conVars,col=trueID))
with(dat,table(data.frame(catVars[,1:2],trueID)))
``` |

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker. Vote for new features on Trello.