R/transforms.R

Defines functions tspTransform kInverse kTransform

Documented in kInverse kTransform tspTransform

# Dataset transformations

#' Kendall transformation
#' 
#' @param x Vector or data frame to be Kendall-transformed; allowed feature types are numeric, integer (treated as numeric), ordered factor, logical and unordered factor with two or less levels.
#' \code{NA} and non-finite values are allowed; \code{NaN} is treated as \code{NA}.
#' @return A transformed vector or data frame with transformed columns.
#' @references "Kendall transformation brings a robust categorical representation of ordinal data" M.B. Kursa. SciRep 12, 8341 (2022).
#' @examples
#' kTransform(data.frame(Asc=1:3,Desc=3:1,Vsh=c(2,1,2)))
#' @export
kTransform<-function(x)
 if(is.data.frame(x)) data.frame(.Call(C_kt,x)) else .Call(C_kt,x)

#' Inverse Kendall transform
#'
#' This function attempts to reverse Kendall transformation using a simple ranking agreement method, which always restores original ranking if the input corresponds to one, or a reasonable best-effort guess if not.
#' Namely, each objects gets a score based on its relation with each other object, 2 points for a win (\code{'>'}) and 1 point for a tie (\code{'='}); these scores are used to calculate ranks.
#' This function can also be directly given greater-than scores, for instance confidence scores from some classifier trained on Kendall-transformed data.
#' @param x A Kendall-transformed feature to be converted back into a ranking.
#' To be interpreted as a such, it must be a factor with levels being a subset of \code{'<'}, \code{'>'} or \code{'='}.
#' Alternatively, it may be a numeric vector of greater-than scores.
#' @return Vector of ranks corresponding to \code{x}.
#' @note An order of elements in \code{x} is crucial; if it is not the same as generated by the \code{\link{kTransform}}, results will be wrong.
#' This function cannot assert that the order is correct.
#' @references "Kendall transformation brings a robust categorical representation of ordinal data" M.B. Kursa. SciRep 12, 8341 (2022).
#' @examples
#' kInverse(kTransform(1:7))
#' @export
kInverse<-function(x){
 if(is.factor(x))
  if(all(levels(x)%in%c("<",">","=")))
   x<-factor(x,levels=c("<","=",">"))
  else stop("Factor does not seem to be a Kendall-transformed variable")
 rank(colSums(.Call(C_rkt,x),na.rm=TRUE))
}

#' Top-scoring pairs transformation
#'
#' Applies a top-scoring pairs transformation, that is creates \eqn{m\cdot (m-1)/2} logical features, for each two-element subset of original features,
#' composed of \code{TRUE} when the value of the first is larger or equal than in the second and \code{FALSE} otherwise (first and second here is according to the order of features in input).
#'
#' This transformation can be used to recreate top-scoring pairs methods using information theory concepts, for instance using \code{\link{MIM}}.
#' The main gain form TSP is that it is resilient to calibration errors, in particular some sample batch biases, it also generates a robust and parameter-less discrete representation of the continuous input.
#' It is lossy, however, and the generated scores for feature pairs may be hard for interpretation; the inflation of feature count can also pose practical problems, which is a reason why this function offers a way to efficiently and randomly under-sample the output.
#'
#' For TSP to work well, it is crucial that input features have approximately identical distribution, so that the output features would have enough entropy to be informative given some decision or when compared with each other; to this end, re-scaling may be required, for instance with \code{\link{scale}}.
#' @param x Data.frame to be converted; has to be composed of at least two features of a single type (to be comparable).
#' @param sep Separator string used to join original feature names to generate names for transformed features. 
#'  Can be set to \code{NULL} to generate generic names instead, which is faster.
#' @param sample A number of features to generate.
#'  If set, the function generates only a random subset out of all possible \eqn{m\cdot (m-1)/2} feature pairs.
#' @param check.names Passed to the underlying call to \code{\link{data.frame}}; if set to \code{TRUE}, performs a coercion of feature names.
#' @return A logical \code{data.frame}.
#' @note \code{NA}s are accepted and treated as incomparable values.
#' @examples 
#' tspTransform(data.frame(a=1:3,b=1:3,c=rep(2,3)),sep='>=')
#' @examples
#' #Convering iris data
#' tspIris<-tspTransform(data.frame(scale(iris[,-5])))
#' #Feature selection
#' MIM(tspIris,iris$Species)
#' @export
tspTransform<-function(x,sep='__',sample,check.names=FALSE)
 data.frame(.Call(C_tsp,x,sep,if(missing(sample)) 0 else sample),check.names=check.names)

Try the praznik package in your browser

Any scripts or data that you put into this service are public.

praznik documentation built on Nov. 11, 2025, 9:06 a.m.