R/DatRel.R

Defines functions DatRel

Documented in DatRel

#' @title Data Relocation for Resampled Data using Pure and Proper Class Cover Catch Digraph
#'
#' @description \code{DatRel} relocates resampled data using Pure and Proper Class Cover Catch Digraph
#'
#' @param x feature matrix or dataframe.
#' @param y class factor variable.
#' @param x_syn synthetic data generated by an oversampling method.
#' @param proportion proportion of covered samples. A real number between \eqn{(0,1]}.
#' 1 by default. Algorithm stops when desired percent of coverage achieved in each class.
#' Smaller numbers results in less dominant samples.
#' @param p_of proportion to increase cover radius. A real number between
#'  \eqn{(0,\infty)}. Default is 0. Higher values tolerate other classes more.
#' @param class_pos Class name of synthetic data. Default is NULL. If NULL,
#' positive class is minority class.
#'
#' @details
#' Calculates cover areas using pure and proper class cover catch digraphs (PCCCD) for
#' original dataset. Any sample outside of cover area is relocated towards a
#' specific dominant point. Determination of dominant point to move towards is
#' based on distance based on radii of PCCCD balls. \code{p_of} is to increase
#' obtained radii to be more tolerant to noise. \code{prooportion} argument is
#' cover percentage for PCCCD to stop when desired percentage is covered for
#' each class. PCCCD models are determined using \code{rcccd} package.
#' \code{class_pos} argument is used to specify oversampled class.
#'
#' @return an list object which includes:
#'  \item{x_new}{Oversampled and relocated feature matrix}
#'  \item{y_new}{Oversampled class variable}
#'  \item{x_syn}{Generated and relocated sample matrix}
#'  \item{i_dominant}{Indexes of dominant samples}
#'  \item{x_pos_dominant}{Dominant samples for positive class}
#'  \item{radii_pos_dominant}{Positive class cover percentage}
#'
#' @author Fatih Saglam, saglamf89@gmail.com
#'
#' @examples
#'
#' library(SMOTEWB)
#' library(rcccd)
#'
#' set.seed(10)
#' # adding data
#' x <- rbind(matrix(rnorm(2000, 3, 1), ncol = 2, nrow = 1000),
#'            matrix(rnorm(60, 6, 1), ncol = 2, nrow = 30))
#' y <- as.factor(c(rep("negative", 1000), rep("positive", 30)))
#'
#' # adding noise
#' x[1001,] <- c(3,3)
#' x[1002,] <- c(2,2)
#' x[1003,] <- c(4,4)
#'
#' # resampling
#' m_SMOTE <- SMOTE(x = x, y = y, k = 3)
#'
#' # relocation of resampled data
#' m_DatRel <- DatRel(x = x, y = y, x_syn = m_SMOTE$x_syn)
#'
#' # resampled data
#' plot(x, col = y, main = "SMOTE")
#' points(m_SMOTE$x_syn, col = "green")
#'
#' # resampled data after relocation
#' plot(x, col = y, main = "SMOTE + DatRel")
#' points(m_DatRel$x_syn, col = "green")
#'
#' @rdname DatRel
#' @export

DatRel <- function(x, y, x_syn, proportion = 1, p_of = 0, class_pos = NULL) {
  var_names <- colnames(x)
  x <- as.matrix(x)
  # p <- ncol(x)

  class_names <- as.character(unique(y))
  if (is.null(class_pos)) {
    class_pos <- names(which.min(table(y)))
  }
  class_neg <- as.character(class_names[class_names != class_pos])

  x_pos <- x[y == class_pos,,drop = FALSE]
  x_neg <- x[y == class_neg,,drop = FALSE]

  n_pos <- nrow(x_pos)
  n_neg <- nrow(x_neg)
  n_syn <- nrow(x_syn)

  x <- rbind(x_pos, x_neg)
  y <- c(y[y == class_pos], y[y == class_neg])

  m_dominate <- f_dominate(x_main = x_pos, x_other = x_neg, proportion = proportion, p_of = p_of)
  dist_pos2neg <- m_dominate$dist_main2other

  i_dominant <- m_dominate$i_dominant
  x_pos_dominant <- x_pos[m_dominate$i_dominant,,drop = FALSE]
  radii_pos_dominant <- dist_pos2neg[m_dominate$i_dominant,]

  x_syn <- f_relocate(x_pos_dominant = x_pos_dominant,
                      x_syn = x_syn,
                      radii_pos_dominant = radii_pos_dominant)
  x_new <- rbind(
    x_syn,
    x_pos,
    x_neg
  )
  y_new <- c(
    rep(class_pos, n_syn + n_pos),
    rep(class_neg, n_neg)
  )
  y_new <- factor(y_new, levels = levels(y), labels = levels(y))
  colnames(x_new) <- var_names

  return(list(
    x_new = x_new,
    y_new = y_new,
    x_syn = x_new[1:n_syn,, drop = FALSE],
    i_dominant = i_dominant,
    x_pos_dominant = x_pos_dominant,
    radii_pos_dominant = radii_pos_dominant
  ))
}

Try the imbalanceDatRel package in your browser

Any scripts or data that you put into this service are public.

imbalanceDatRel documentation built on June 8, 2025, 12:47 p.m.