Description Usage Arguments Details Value References See Also Examples
View source: R/gen_multi_data.R
gen_multi_data
generate the data used for multiple-class
classification problems.
1 | gen_multi_data(beta0, N, type, test_ratio)
|
beta0 |
A numeric matrix that represent the true coefficient that used to generate the synthesized data. |
N |
A numeric number specifying the number of the synthesized data. It should be a integer. Note that the value shouldn't be too small. We recommend that the value be 10000. |
type |
A character string that determines which type of data will be generated, matching one of 'ord' or 'cat'. |
test_ratio |
A numeric number specifying proportion of test sets in all data. It should be a number between 0 and 1. Note that the value of the test_ratio should not be too large, it is best if this value is equal to 0.2-0.3. |
gen_multi_data creates training dataset and testing datasets. The beta0 is a p * k matrix which p is the length of true coefficient and (k + 1) represents the number of categories. The value of 'type' can be 'ord' or 'cat' . If it equals to 'ord', it means the data has an ordinal relation among classes ,which is common in applications (e.g., the label indicates the severity of a disease or product preference). If it is 'cat', it represents there is no such ordinal relations among classes. In addition, the response variable y are then generated from a multinomial distribution with the explanatory variables x generated from a multivariate normal distribution with mean vector equal to 0 and the identity covariance matrix.
a list containing the following components
train_id |
The id of the training samples |
train |
the training datasets. Note that the id of the data in the train dataset is the same as the train_id |
test |
the testing datasets |
Li, J., Chen, Z., Wang, Z., & Chang, Y. I. (2020). Active learning in multiple-class classification problems via individualized binary models. Computational Statistics & Data Analysis, 145, 106911. doi:10.1016/j.csda.2020.106911
gen_bin_data
for binary classification case
gen_GEE_data
for generalized estimating equations case.
1 | # For an example, see example(seq_ord_model)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.