R/gktau.R

Defines functions gktau

Documented in gktau

#' @title gktau
#' @description Measure association between nominal (no order for levels)
#' variables 
#' @details Measure association between nominal (no order for levels) variables 
#' using Goodman and Kruskal tau. Code modified from the original source:
#' r-bloggers.com/measuring-associations-between-non-numeric-variables/
#' An important feature of this procedure is that it allows missing
#' values in either of the variables x or y, treating 'missing' as an
#' additional level.  In practice, this is sometimes very important since
#' missing values in one variable may be strongly associated with either
#' missing values in another variable or specific non-missing levels of
#' that variable. An important characteristic of Goodman and Kruskal's tau 
#' measure is its asymmetry: because the variables x and y enter this expression
#' differently, the value of a(y,x) is not the same as the value of
#' a(x, y), in general.  This stands in marked contrast to either the
#' product-moment correlation coefficient or the Spearman rank
#' correlation coefficient, which are both symmetric, giving the same
#' association between x and y as that between y and x.  The fundamental
#' reason for the asymmetry of the general class of measures defined
#' above is that they quantify the extent to which the variable x is
#' useful in predicting y, which may be very different than the extent to
#' which the variable y is useful in predicting x.
#'
#' @param x first variable
#' @param y second variable
#'
#' @return Dependency measure
#'
#' @examples 
#' data(peerj32)
#' v1 <- unlist(peerj32$microbes[,1])
#' v2 <- unlist(peerj32$lipids[,1])
#' tc <- gktau(v1, v2)
#'
#' @export
#' @references 
#' Code modified from the original source:
#' \url{
#' http://r-bloggers.com/measuring-associations-between-non-numeric-variables/
#' }
#' To cite the microbiome R package, see citation('microbiome') 
#' @author Contact: Leo Lahti \email{microbiome-admin@@googlegroups.com}
#' @keywords utilities
gktau <- function(x, y) {
    
    # First, compute the IxJ contingency table between x and y
    Nij <- table(x, y, useNA="ifany")
    # Next, convert this table into a joint probability estimate
    PIij <- Nij/sum(Nij)
    # Compute the marginal probability estimates
    PIiPlus <- apply(PIij, MARGIN=1, sum)
    PIPlusj <- apply(PIij, MARGIN=2, sum)
    # Compute the marginal variation of y
    Vy <- 1 - sum(PIPlusj^2)
    # Compute the expected conditional variation of y given x
    InnerSum <- apply(PIij^2, MARGIN=1, sum)
    VyBarx <- 1 - sum(InnerSum/PIiPlus)
    # Compute and return Goodman and Kruskal's tau measure
    tau <- (Vy - VyBarx)/Vy
    tau
}

Try the microbiome package in your browser

Any scripts or data that you put into this service are public.

microbiome documentation built on Nov. 8, 2020, 5:08 p.m.