# Implements the stochastic proximity embedding algorithm

### Description

Embeds an N dimensional dataset in M dimensions, such that distances (or similarities) in the original N dimensions are maintained (as close as possible) in the final M dimensions

### Usage

1 2 3 4 5 |

### Arguments

`coord` |
This should be a matrix with number of rows equal to the number of observations and number of columns equal to the input dimension. A data.frame may also be supplied and it will be converted to a matrix (so all names will be lost) |

`rcutpercent` |
This is the percentage of the maximum distance (as determined by probability sampling) that will be used as the neighborhood radius. Setting rcutpercent to a value greater than 1 effectively sets it to infinity. |

`maxdist` |
If you have alread calculated a mxaimum distance then you can supply it and probability sampling will not be carried out to obtain a maximum distance. The default is to carry out sampling. By setting maxdist to a non zero value sampling will not be carried out (even if sampledist=TRUE) |

`nobs` |
The number of observations. If it is not specified nobs will be taken as nrow(coord) |

`ndim` |
The number of input dimensions. If not specified it will be taken as ncol(coord) |

`edim` |
The number of dimensions to embed in |

`lambda0` |
The starting value of the learning parameter |

`lambda1` |
The ending value of the learning parameter |

`nstep` |
The number of refinement steps |

`ncycle` |
The number of cycles to carry out refinement for |

`evalstress` |
If TRUE the function will evaluate the Sammon stress on the final embedding |

`sampledist` |
If TRUE an approximation to the maximum distance in the input dimensions will be obtained via probability sampling |

`samplesize` |
The number of iterations for probability sampling. For a dataset of 6070 observations there will be 6070x6069/2 pairwise distances. The default value gives a close approximation and runs fast. If you want a bettr approximation 1e7 is a good value. YMMV |

### Details

Efficient determination of rcut is yet to be implemented (using the connected component method). As a result you will have
to determine a value of rcutpercent by trail and error.
The pivot SPE method (*J. Mol. Graph. Model.*, 2003, **22**, 133-140) is not yet implemented

### Value

If evalstress is TRUE it will be a list with two components named x and stress. x is the matrix of the final embedding and stress is the final stress

### Author(s)

Rajarshi Guha rajarshi@presidency.com

### References

A Self Organizing Principle for Learning Nonlinear Manifolds, *Proc. Nat. Acad. Sci.*, 2002, **99**, 15869-15872
Stochastic Proximity Embedding, *J. Comput. Chem.*, 2003, **24**, 1215-1221
A Modified Rule for Stochastic Proximity Embedding, *J. Mol. Graph. Model.*, 2003, **22**, 133-140
A Geodesic Framework for Analyzing Molecular Similarities, *J. Chem. Inf. Comput. Sci.*, 2003, **43**, 475-484

### See Also

`eval.stress`

, `sample.max.distance`

### Examples

1 2 3 4 5 6 7 8 9 10 11 12 13 14 | ```
## load the phone dataset
data(phone)
## run SPE, embed$stress should be 0 or very close to it
## You can plot the embedding using the scatterplot3d package
## (This will take a few minutes to run)
embed <- spe(phone, edim=3, evalstress=TRUE)
## evaluate the Sammon stress
stress <- eval.stress(embed$x, phone)
## embed the Swiss Roll dataset in 2D
data(swissroll)
embed <- spe(swissroll, edim=2, evalstress=TRUE)
``` |