# mergeNodes: Merges nodes showing distance values equal to zero In sidier: Substitution and Indel Distances to Infer Evolutionary Relationships

## Description

This function returns a new distance matrix merging rows (and columns) showing distance values equal to zero. It also deals with missing data.

## Usage

 ```1 2``` ```mergeNodes(dis, na.rm.row.col = FALSE, save.distance = FALSE, save.distance.name = "Merged_Distance.txt") ```

## Arguments

 `dis` the input distance matrix `na.rm.row.col` a logical; if TRUE, missing values are removed before the computation proceeds. `save.distance` a logical; if TRUE, the new distance matrix will be saved in a file. `save.distance.name` a string; if save.distance is set to TRUE, it defines the name of the file to be saved.

## Details

In some circumstances you may get distance matrices showing off-diagonal zeros. In such cases you may consider that the existence of these off-diagonal zeros suggests that some of the groups you defined (e.g., populations) are not genetically different. Thus, you must re-define groups to get a matrix composed only by different groups using the 'mergeNodes' function and estimate a percolation network using the 'perc.thr' function. On the other hand, you may consider that, despite the off- diagonal zeros, the groups you defined are actually different. In that case you may not be able to estimate a percolation threshold, but you can represent the original distance matrix using the 'NINA.thr' or the 'zero.thr' functions.

'mergeNodes' select all rows (and columns) showing a distance equal to zero and generates a new row (and column). The distance between the new merged and the remaining rows (or columns) in the matrix is estimated as the arithmetic mean of the selected elements. The biological interpretation of the new matrix could be hard if the original matrix shows a large number of off-diagonal zeros.

'perc.thr' estimates a threshold to represent a distance matrix as a network. To estimate this threshold, the algorithm represents as a link all distances lower than a range of thresholds (by default, select 101 values from 0 to 1), defined as the percentage of the maximum distance in the input matrix. For each threshold a network is built and the number of clusters (that is, the number of isolated groups of nodes) in the network is also estimated. Finally, the algorithm selects the lower threshold connecting a higher number of nodes. Note that the resulting network may show isolated nodes if it is necessary to represent a large number of links to connect a low number of nodes.

'NINA.thr' is identical to 'perc.thr', but, in the last step, the algorithm selects the lower threshold connecting all nodes in a single cluster. The information provided by this function may be limited if the original distance matrix shows high variation.

'zero.thr' represents as a link only distances equal to zero. The information provided by this function may be limited if the original matrix shows few off-diagonal zeros.

## Value

a distance matrix with merged rows and columns

## Author(s)

A. J. Mu<c3><b1>oz-Pajares

`NINA.thr`, `zero.thr`, `perc.thr`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72``` ```#EXAMPLE 1: FEW OFF-DIAGONAL ZEROS #Generating a distance matrix: Dis1<-matrix(c( 0.00,0.77,0.28,0.94,0.17,0.14,0.08,0.49,0.64,0.01, 0.77,0.00,0.12,0.78,0.97,0.02,0.58,0.09,0.36,0.33, 0.28,0.12,0.00,0.70,0.73,0.06,0.50,0.79,0.80,0.94, 0.94,0.78,0.70,0.00,0.00,0.78,0.04,0.42,0.25,0.85, 0.17,0.97,0.73,0.00,0.00,0.30,0.55,0.12,0.68,0.99, 0.14,0.02,0.06,0.78,0.30,0.00,0.71,1.00,0.64,0.88, 0.08,0.58,0.50,0.04,0.55,0.71,0.00,0.35,0.84,0.76, 0.49,0.09,0.79,0.42,0.12,1.00,0.35,0.00,0.56,0.81, 0.64,0.36,0.80,0.25,0.68,0.64,0.84,0.56,0.00,0.62, 0.01,0.33,0.94,0.85,0.99,0.88,0.76,0.81,0.62,0.00),ncol=10) colnames(Dis1)<-c(paste("Pop",c(1:10),sep="")) row.names(Dis1)<-colnames(Dis1) # No percolation threshold can be found. #perc.thr(Dis1) #Check Dis1 and merge populations showing distances equal to zero: Dis1 Dis1_Merged<-mergeNodes(dis=Dis1) #Check the merged matrix. A new "population" has been #defined merging populations 4 and 5. #Distances between the merged and the remaining populations are estimated as the arithmetic mean. Dis1_Merged # It is now possible to estimate a percolation threshold perc.thr(dis=Dis1_Merged,ptPDF=FALSE, estimPDF=FALSE, estimOutfile=FALSE) # EXAMPLE 2: TOO MANY OFF-DIAGONAL ZEROS #Generating a distance matrix: Dis2<-matrix(c( 0.00,0.77,0.28,0.00,0.17,0.14,0.00,0.49,0.64,0.01, 0.77,0.00,0.12,0.00,0.97,0.02,0.00,0.09,0.36,0.33, 0.28,0.12,0.00,0.70,0.73,0.06,0.50,0.79,0.00,0.94, 0.00,0.00,0.70,0.00,0.00,0.78,0.04,0.00,0.00,0.00, 0.17,0.97,0.73,0.00,0.00,0.30,0.55,0.12,0.00,0.00, 0.14,0.02,0.06,0.78,0.30,0.00,0.71,1.00,0.64,0.00, 0.00,0.00,0.50,0.04,0.55,0.71,0.00,0.35,0.84,0.00, 0.49,0.09,0.79,0.00,0.12,1.00,0.35,0.00,0.56,0.81, 0.64,0.36,0.00,0.00,0.00,0.64,0.84,0.56,0.00,0.62, 0.01,0.33,0.94,0.00,0.00,0.00,0.00,0.81,0.62,0.00),ncol=10) colnames(Dis2)<-c(paste("Pop",c(1:10),sep="")) row.names(Dis2)<-colnames(Dis2) # No percolation threshold can be found #perc.thr(Dis2) #Check Dis2 and merge populations showing distances equal to zero: Dis2 Dis2_Merged<-mergeNodes(dis=Dis2) #Check the merged matrix. Many new "populations" have been defined #and both the new matrix and the resulting network are difficult #to interpret: Dis2_Merged perc.thr(dis=Dis2_Merged,ptPDF=FALSE, estimPDF=FALSE, estimOutfile=FALSE) #Instead of percolation network, representing zeros #as the lowest values may be informative: zero.thr(dis=Dis2,ptPDF=FALSE) # Adjusting sizes and showing modules: zero.thr(dis=Dis2,ptPDF=FALSE,cex.label=0.8,cex.vertex=1.2,modules=TRUE) #In the previous example, the 'zero.thr' method is unuseful: zero.thr(dis=Dis1,ptPDF=FALSE) #In both cases, the 'No Isolation Nodes Allowed' method yields an informative matrix: NINA.thr(dis=Dis1,modules=TRUE) NINA.thr(dis=Dis2,modules=TRUE) ```