gen_multi_data: Generate the training data and testing data for the...

Description Usage Arguments Details Value References See Also Examples

View source: R/gen_multi_data.R


gen_multi_data generate the data used for multiple-class classification problems.


gen_multi_data(beta0, N, type, test_ratio)



A numeric matrix that represent the true coefficient that used to generate the synthesized data.


A numeric number specifying the number of the synthesized data. It should be a integer. Note that the value shouldn't be too small. We recommend that the value be 10000.


A character string that determines which type of data will be generated, matching one of 'ord' or 'cat'.


A numeric number specifying proportion of test sets in all data. It should be a number between 0 and 1. Note that the value of the test_ratio should not be too large, it is best if this value is equal to 0.2-0.3.


gen_multi_data creates training dataset and testing datasets. The beta0 is a p * k matrix which p is the length of true coefficient and (k + 1) represents the number of categories. The value of 'type' can be 'ord' or 'cat' . If it equals to 'ord', it means the data has an ordinal relation among classes ,which is common in applications (e.g., the label indicates the severity of a disease or product preference). If it is 'cat', it represents there is no such ordinal relations among classes. In addition, the response variable y are then generated from a multinomial distribution with the explanatory variables x generated from a multivariate normal distribution with mean vector equal to 0 and the identity covariance matrix.


a list containing the following components


The id of the training samples


the training datasets. Note that the id of the data in the train dataset is the same as the train_id


the testing datasets


Li, J., Chen, Z., Wang, Z., & Chang, Y. I. (2020). Active learning in multiple-class classification problems via individualized binary models. Computational Statistics & Data Analysis, 145, 106911. doi:10.1016/j.csda.2020.106911

See Also

gen_bin_data for binary classification case

gen_GEE_data for generalized estimating equations case.


# For an example, see example(seq_ord_model)

seqest documentation built on July 2, 2020, 2:28 a.m.