Description Usage Arguments Details Value Author(s) References See Also Examples
Embeds an N dimensional dataset in M dimensions, such that distances (or similarities) in the original N dimensions are maintained (as close as possible) in the final M dimensions
1 2 3 4 5 |
coord |
This should be a matrix with number of rows equal to the number of observations and number of columns equal to the input dimension. A data.frame may also be supplied and it will be converted to a matrix (so all names will be lost) |
rcutpercent |
This is the percentage of the maximum distance (as determined by probability sampling) that will be used as the neighborhood radius. Setting rcutpercent to a value greater than 1 effectively sets it to infinity. |
maxdist |
If you have alread calculated a mxaimum distance then you can supply it and probability sampling will not be carried out to obtain a maximum distance. The default is to carry out sampling. By setting maxdist to a non zero value sampling will not be carried out (even if sampledist=TRUE) |
nobs |
The number of observations. If it is not specified nobs will be taken as nrow(coord) |
ndim |
The number of input dimensions. If not specified it will be taken as ncol(coord) |
edim |
The number of dimensions to embed in |
lambda0 |
The starting value of the learning parameter |
lambda1 |
The ending value of the learning parameter |
nstep |
The number of refinement steps |
ncycle |
The number of cycles to carry out refinement for |
evalstress |
If TRUE the function will evaluate the Sammon stress on the final embedding |
sampledist |
If TRUE an approximation to the maximum distance in the input dimensions will be obtained via probability sampling |
samplesize |
The number of iterations for probability sampling. For a dataset of 6070 observations there will be 6070x6069/2 pairwise distances. The default value gives a close approximation and runs fast. If you want a bettr approximation 1e7 is a good value. YMMV |
Efficient determination of rcut is yet to be implemented (using the connected component method). As a result you will have to determine a value of rcutpercent by trail and error. The pivot SPE method (J. Mol. Graph. Model., 2003, 22, 133-140) is not yet implemented
If evalstress is TRUE it will be a list with two components named x and stress. x is the matrix of the final embedding and stress is the final stress
Rajarshi Guha rajarshi@presidency.com
A Self Organizing Principle for Learning Nonlinear Manifolds, Proc. Nat. Acad. Sci., 2002, 99, 15869-15872 Stochastic Proximity Embedding, J. Comput. Chem., 2003, 24, 1215-1221 A Modified Rule for Stochastic Proximity Embedding, J. Mol. Graph. Model., 2003, 22, 133-140 A Geodesic Framework for Analyzing Molecular Similarities, J. Chem. Inf. Comput. Sci., 2003, 43, 475-484
eval.stress
, sample.max.distance
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | ## load the phone dataset
data(phone)
## run SPE, embed$stress should be 0 or very close to it
## You can plot the embedding using the scatterplot3d package
## (This will take a few minutes to run)
embed <- spe(phone, edim=3, evalstress=TRUE)
## evaluate the Sammon stress
stress <- eval.stress(embed$x, phone)
## embed the Swiss Roll dataset in 2D
data(swissroll)
embed <- spe(swissroll, edim=2, evalstress=TRUE)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.