phenoDist: Calculate distance between two vectors, rows of one...

View source: R/phenoDist.R

phenoDistR Documentation

Calculate distance between two vectors, rows of one matrix/dataframe, or rows of two matrices/dataframes.

Description

This function does some simple looping to allow x and y to be various combinations of vectors and matrices/dataframes.

Usage

phenoDist(x, y = NULL, bins = 10, vectorDistFun = vectorWeightedDist, ...)

Arguments

x

A vector, matrix or dataframe

y

NULL, a vector, matrix, or dataframe. If x is a vector, y must also be specified.

bins

discretize continuous fields in the specified number of bins

vectorDistFun

A function of two vectors that returns the distance between those vectors.

...

Extra arguments passed on to vectorDistFun

Value

a matrix of distances between pairs of rows of x (if y is unspecified), or between all pairs of rows between x and y (if both are provided).

Author(s)

Levi Waldron, Markus Riester, Marcel Ramos

Examples


example("phenoFinder")

pdat1 <- pData(esets2[[1]])
pdat2 <- pData(esets2[[2]])

## Use phenoDist() to calculate a weighted distance matrix
distmat <- phenoDist(as.matrix(pdat1), as.matrix(pdat2))
## Note outliers with identical clinical data, these are probably the same patients:
graphics::boxplot(distmat)

## Not run: 
   library(curatedOvarianData)
   data(GSE32063_eset)
   data(GSE17260_eset)
   pdat1 <- pData(GSE32063_eset)
   pdat2 <- pData(GSE17260_eset)
   ## Curation of the alternative sample identifiers makes duplicates stand out more:
   pdat1$alt_sample_name <-
     paste(pdat1$sample_type,
           gsub("[^0-9]", "", pdat1$alt_sample_name),
           sep = "_")
   pdat2$alt_sample_name <-
     paste(pdat2$sample_type,
           gsub("[^0-9]", "", pdat2$alt_sample_name),
           sep = "_")
   ## Removal of columns that cannot possibly match also helps duplicated patients to stand out
   pdat1 <-
     pdat1[,!grepl("uncurated_author_metadata", colnames(pdat1))]
   pdat2 <-
     pdat2[,!grepl("uncurated_author_metadata", colnames(pdat2))]
   ## Use phenoDist() to calculate a weighted distance matrix
   distmat <- phenoDist(as.matrix(pdat1), as.matrix(pdat2))
   ## Note outliers with identical clinical data, these are probably the same patients:
   graphics::boxplot(distmat)

## End(Not run)


lwaldron/doppelgangR documentation built on Jan. 9, 2025, 1:15 a.m.