# distee: Calculate distance between two gene expression data sets In kbroman/lineup: Lining Up Two Sets of Measurements

## Description

Calculate a distance between all pairs of individuals for two gene expression data sets

## Usage

 ```1 2``` ```distee(e1, e2 = NULL, d.method = c("rmsd", "cor"), labels = c("e1", "e2"), verbose = TRUE) ```

## Arguments

 `e1` Numeric matrix of gene expression data, as individuals x genes. The row and column names must contain individual and gene identifiers. `e2` (Optional) Like `e1`. An appreciable number of individuals and genes must be in common. `d.method` Calculate inter-individual distance as RMS difference or as correlation. `labels` Two character strings, to use as labels for the two data matrices in subsequent output. `verbose` if TRUE, give verbose output.

## Details

We calculate the pairwise distance between all individuals (rows) in `e1` and all individuals in `e2`. This distance is either the RMS difference (`d.method="rmsd"`) or the correlation (`d.method="cor"`).

## Value

A matrix with `nrow(e1)` rows and `nrow(e2)` columns, containing the distances. The individual IDs are in the row and column names. The matrix is assigned class `"lineupdist"`.

## Author(s)

Karl W Broman, [email protected]

`pulldiag`, `omitdiag`, `summary.lineupdist`, `plot2dist`, `disteg`, `corbetw2mat`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35``` ```# load the example data data(expr1, expr2) # find samples in common id <- findCommonID(expr1, expr2) # calculate correlations between cols of x and cols of y thecor <- corbetw2mat(expr1[id\$first,], expr2[id\$second,]) # subset at genes with corr > 0.8 and scale values expr1s <- expr1[,thecor > 0.8]/1000 expr2s <- expr2[,thecor > 0.8]/1000 # calculate distance (using "RMS difference" as a measure) d1 <- distee(expr1s, expr2s, d.method="rmsd", labels=c("1","2")) # calculate distance (using "correlation" as a measure...really similarity) d2 <- distee(expr1s, expr2s, d.method="cor", labels=c("1", "2")) # pull out the smallest 8 self-self correlations sort(pulldiag(d2))[1:8] # summary of results summary(d1) summary(d2) # plot histograms of RMS distances plot(d1) # plot histograms of correlations plot(d2) # plot distances against one another plot2dist(d1, d2) ```