doppelgangR: Identify likely duplicate samples from genomic or meta-data

#' Calculate a weighted distance between two vectors, using pairwise complete
#' observations.
#' 
#' Simple function to count the fraction of different elements (in the same
#' position) between two vectors of the same length, after removing elements
#' from both vectors corresponding to positions that are NA in either vector.
#' Distance is the probability for observing the matches and mismatches in two
#' random patients.
#' 
#' 
#' @param x a matrix
#' @param y a matrix with the same number of columns as x
#' @param k row in x to test for differences
#' @param l row in y to test for differences
#' @return Returns a numeric value, the log of the probability of observing the
#' matches in x and y
#' @author Levi Waldron, Markus Riester, Marcel Ramos
#' @examples
#' 
#' mymat1 <- matrix(rnorm(20), ncol = 5)
#' mymat1[1, 4] <- NA
#' mymat2 <- matrix(rnorm(20), ncol = 5)
#' vectorWeightedDist(mymat1, mymat2, 1, 2)
#' 
#' @export vectorWeightedDist
vectorWeightedDist <-
  function #Calculate a weighted distance between two vectors, using pairwise complete observations.
### Simple function to count the fraction of different elements (in
### the same position) between two vectors of the same length, after
### removing elements from both vectors corresponding to positions
### that are NA in either vector. Distance is the probability for observing
### the matches and mismatches in two random patients.
(x,
 ### a matrix
 y,
 ### a matrix with the same number of columns as x
 k,
 ### row in x to test for differences
 l
 ### row in y to test for differences
) {
  idx <- !(is.na(x[k, ]) | is.na(y[l, ]))
  
  if (sum(idx) < 2)
    return(1)
  
  x <- x[, idx, drop = FALSE]
  y <- y[, idx, drop = FALSE]
  
  p.x <- sapply(1:ncol(x), function(i)
    sum(x[k, i] == x[, i],
        na.rm = TRUE) / nrow(x[!is.na(x[, i]), ]))
  p.y <- sapply(1:ncol(y), function(i)
    sum(y[l, i] == y[, i],
        na.rm = TRUE) / nrow(y[!is.na(y[, i]), ]))
  
  idx <- x[k, ] != y[l, ]
  w <- 1 - p.x * p.y
  if (sum(w) < 2)
    return(1)
  1 - (sum(ifelse(idx, -1, 1) * w) / sum(w) + 1) / 2
  ### Returns a numeric value, the log of the probability of observing the
  ### matches in x and y
}

waldronlab/doppelgangR documentation built on July 5, 2025, 5:29 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

waldronlab/doppelgangR
Identify likely duplicate samples from genomic or meta-data

R/vectorWeightedDist.R
In waldronlab/doppelgangR: Identify likely duplicate samples from genomic or meta-data

R Package Documentation

Browse R Packages

We want your feedback!

waldronlab/doppelgangR Identify likely duplicate samples from genomic or meta-data

R/vectorWeightedDist.R In waldronlab/doppelgangR: Identify likely duplicate samples from genomic or meta-data

R Package Documentation

Browse R Packages

We want your feedback!

waldronlab/doppelgangR
Identify likely duplicate samples from genomic or meta-data

R/vectorWeightedDist.R
In waldronlab/doppelgangR: Identify likely duplicate samples from genomic or meta-data