# biwtCorrelation: A function to compute a weighted correlation based on Tukey's... In biwt: Functions to compute the biweight mean vector and covariance & correlation matrices

## Description

The following function compute a multivariate location and scale estimate based on Tukey's biweight weight function.

## Usage

 `1` ```biwt.cor(x, r=.2, output="matrix", median=TRUE, full.init=TRUE, absval=TRUE) ```

## Arguments

 `x` a g x n matrix or data frame (g is the number of observations (genes), n is the number of measurements) `r` breakdown (k/n where k is the largest number of measurements that can be replaced with arbitrarily large values while keeping the estimates bounded). Default is r=.2. `output` a character string specifying the output format. Options are "matrix" (default), "vector", or "distance". See value below `median` a logical command to determine whether the initialization is done using the coordinate-wise median and MAD^2 (TRUE, default) or using the minimum covariance determinant (MCD) (FALSE). Using the MCD is substantially slower. The MAD is the median of the absolute deviations from the median. See the R help file on `mad`. `full.init` a logical command to determine whether the initialization is done for each pair separately (FALSE) or only one time at the beginning using a random sample from the data matrix (TRUE, default). Initializing for each pair separately is substantially slower. `absval` a logical command to determine whether the distance should be measured as 1 minus the absolute value of the correlation (TRUE, default) or simply 1 minus the correlation (FALSE)

## Details

Using `biwt.est` to estimate the robust covariance matrix, a robust measure of correlation is computed using Tukey's biweight M-estimator. The biweight correlation is essentially a weighted correlation where the weights are calculated based on the distance of each measurement to the data center with respect to the shape of the data. The correlations are computed pair-by-pair because the weights should depend only on the pairwise relationship at hand and not the relationship between all the observations globally. The biwt functions simply compute many pairwise correlations and create distance matrices for use in other algorithms (e.g., clustering).

In order for the biweight estimates to converge, a reasonable initialization must be given. Typically, using TRUE for the median and full.init arguments will provide acceptable initializations. With particularly irregular data, the MCD should be used to give the initial estimate of center and shape. With data sets in which the observations are orders of magnitudes different, full.init=FALSE should be specified.

## Value

Specifying "matrix" for the ouput argument returns a matrix of the biweight correlations.

Specifying "vector" for the ouput argument returns a vector consisting of the lower triangle of the correlation matrix stored by columns in a vector, say bwcor. If g is the number of observations and bwcor is the correlation vector, then for i < j <= g, the biweight correlation between (rows) i and j is bwcor[(j-1)*(j-2)/2 + i]. The length of the vector is g*(g-1)/2, i.e., of order g^2.

Specifying "distance" for the ouput argument returns a matrix of the biweight distances (default is 1 minus absolute value of the biweight correlation).

## Note

If there is too much missing data or if the initialization is not accurate, the function will compute the MCD for a given pair of observations before computing the biweight correlation (regardless of the initial settings given in the call to the function).

The "vector" output option is given so that correlations can be stored as vectors which are less computationally intensive than matrices.

## Author(s)

Jo Hardin jo.hardin@pomona.edu

## References

Hardin, J., Mitani, A., Hicks, L., VanKoten, B.; A Robust Measure of Correlation Between Two Genes on a Microarray, BMC Bioinformatics, 8:220; 2007.

`biwt.est`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21``` ```samp.data <-t(mvrnorm(30,mu=c(0,0,0), Sigma=matrix(c(1,.75,-.75,.75,1,-.75,-.75,-.75,1),ncol=3))) # To compute the 3 pairwise correlations from the sample data: samp.bw.cor <- biwt.cor(samp.data, output="vector") samp.bw.cor # To compute the 3 pairwise correlations in matrix form: samp.bw.cor.mat <- biwt.cor(samp.data) samp.bw.cor.mat # To compute the 3 pairwise distances in matrix form: samp.bw.dist.mat <- biwt.cor(samp.data, output="distance") samp.bw.dist.mat # To convert the distances into an object of class `dist' as.dist(samp.bw.dist.mat) ```