R/init_multi_data.R
In seqest: Sequential Method for Classification and Generalized Estimating Equations Problem

Documented in init_multi_data

#' @title Generate the labeled and unlabeled datasets
#'
#' @description
#' \code{init_multi_data} creates the labeled and unlabeled datasets for the
#' categorical and ordinal case.
#'
#' @details
#' init_multi_data generates the initial labeled dataset and the unlabeled
#' datasets which we will select a most informative sample from the unlabeled
#' datasets into the labeled dataset. The number of samples in the initial
#' labeled datasets is specified the init_N argument. The value of 'type' should
#' be'ord' or 'cat'. If it equals to 'ord', the element of the splitted will be
#' composed of samples from Classes K and Classes K+1. Otherwise, the element of
#' the splitted will be composed of samples from  Classes 0 and Classes K.
#' @param train_id A numeric vector denotes the id of the all training samples.
#'   Each sample corresponds to a unique identification from 1 to the length of
#'   all the samples.
#' @param train A numeric matrix denote the training datasets. The length of the
#'   train's row is the number of the training samples and the first column
#'   represents the labels and the rest columns are the explanatory variables.
#'   Note that the id of the sample in the train dataset is the same as the
#'   train_id.
#' @param init_N A numeric value that determine the number of the initial
#'   labeled samples. Note that it shouldn't be too large or too small.
#' @param type A character string that determines which type of data will be
#'   generated, matching one of 'ord' or 'cat'.
#' @export
#' @return a list containing the following components
#' \item{splitted}{a list containing the datasets which we will use}
#' \item{train}{the initial labeled datasets. The number of the datasets is
#' specified by the init_N}
#' \item{newY}{the value of the labels from 0 to K which denotes the number of
#' categories}
#' \item{labeled_ids}{the unique id of the initial labeled dataset }
#' \item{unlabeled_ids}{the unique id of the unlabeled
#' dataset}
#' \item{data}{the all training samples which is composed of the samples
#' corresponding to labeled_ids and samples corresponding to unlabeled_ids}
#'
#'
#' @examples
#'## For an example, see example(seq_ord_model)

init_multi_data <- function(train_id, train, init_N, type) {
  data <- train
  index <- 1:nrow(data) #1,2,3,4...
  # index <- train_id
  init_ids <- sample(index, init_N) # init_N = 300
  # index <- 1:params$N
  labeled_ids <- init_ids
  unlabeled_ids <- index[-labeled_ids]
  # unlabeled_ids <- setdiff(index, labeled_ids)
  train <- data[labeled_ids, ]
  if(type=="cat"){
    nClass <- length(unique(train[, 1]))
    Y  <- train[, 1]
    X <- train[, -1]
    splitted <- lapply(seq_len(nClass-1), function(k) {
      ind <- (Y == 0 | Y == k);
      dataframe <- data.frame(cbind(Y=Y[ind], X[ind,]));
      dataframe$Y <- factor(dataframe$Y);
      dataframe})
  } else {
    nClass <- length(unique(train[, 1]))
    Y  <- train[, 1]
    X <- train[, -1]
    splitted <- lapply(seq_len(nClass-1), function(k) {
      ind <- (Y == k-1 | Y == k);
      dataframe <- data.frame(cbind(Y=Y[ind], X[ind,]));
      dataframe$Y <- factor(dataframe$Y);
      dataframe})
  }
  newY <- 0
  return(list(splitted = splitted, train = train, newY = newY,
              labeled_ids = labeled_ids, unlabeled_ids = unlabeled_ids, data = data))
}

Any scripts or data that you put into this service are public.

seqest documentation built on July 2, 2020, 2:28 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

seqest
Sequential Method for Classification and Generalized Estimating Equations Problem

R/init_multi_data.R
In seqest: Sequential Method for Classification and Generalized Estimating Equations Problem

Defines functions init_multi_data

Documented in init_multi_data

Try the seqest package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

seqest Sequential Method for Classification and Generalized Estimating Equations Problem

R/init_multi_data.R In seqest: Sequential Method for Classification and Generalized Estimating Equations Problem

Defines functions init_multi_data

Documented in init_multi_data

Try the seqest package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

seqest
Sequential Method for Classification and Generalized Estimating Equations Problem

R/init_multi_data.R
In seqest: Sequential Method for Classification and Generalized Estimating Equations Problem