gen_multi_data: Generate the training data and testing data for the...

Description Usage Arguments Details Value References See Also Examples

View source: R/gen_multi_data.R

Description

gen_multi_data generate the data used for multiple-class classification problems.

Usage

1
gen_multi_data(beta0, N, type, test_ratio)

Arguments

beta0

A numeric matrix that represent the true coefficient that used to generate the synthesized data.

N

A numeric number specifying the number of the synthesized data. It should be a integer. Note that the value shouldn't be too small. We recommend that the value be 10000.

type

A character string that determines which type of data will be generated, matching one of 'ord' or 'cat'.

test_ratio

A numeric number specifying proportion of test sets in all data. It should be a number between 0 and 1. Note that the value of the test_ratio should not be too large, it is best if this value is equal to 0.2-0.3.

Details

gen_multi_data creates training dataset and testing datasets. The beta0 is a p * k matrix which p is the length of true coefficient and (k + 1) represents the number of categories. The value of 'type' can be 'ord' or 'cat' . If it equals to 'ord', it means the data has an ordinal relation among classes ,which is common in applications (e.g., the label indicates the severity of a disease or product preference). If it is 'cat', it represents there is no such ordinal relations among classes. In addition, the response variable y are then generated from a multinomial distribution with the explanatory variables x generated from a multivariate normal distribution with mean vector equal to 0 and the identity covariance matrix.

Value

a list containing the following components

train_id

The id of the training samples

train

the training datasets. Note that the id of the data in the train dataset is the same as the train_id

test

the testing datasets

References

Li, J., Chen, Z., Wang, Z., & Chang, Y. I. (2020). Active learning in multiple-class classification problems via individualized binary models. Computational Statistics & Data Analysis, 145, 106911. doi:10.1016/j.csda.2020.106911

See Also

gen_bin_data for binary classification case

gen_GEE_data for generalized estimating equations case.

Examples

1
# For an example, see example(seq_ord_model)

seqest documentation built on July 2, 2020, 2:28 a.m.