getEDmatrix: Calculate the Euclidean distances between two datasets

Description Usage Arguments Details Value Examples

Description

Function that calculates a matrix of Euclidean distances between each pair of instances from two datasets.

Usage

1
getEDmatrix(set1, set2)

Arguments

set1

a data frame containing only the molecular features meant for the calculation of the Euclidean Distance

set2

a data frame containing only the molecular features meant for the calculation of the Euclidean Distance

Details

No NA values are accepted, so either the respective instance is previously removed or empty values should be replaced (eg., with the respective column median or average). For the purpose of using this package, set1 and set2 should be the same dataset. All columns present at the data frames are used in the calculation of the Euclidean distances, i.e. Euclidean distance between set1[rowi] and set2[rowj]. Prior to calculating the Euclidean distances, the datasets will be scaled using scale, which applies \frac{x_{ij}-min_j}{max_j-min_j} to each instance x_i under column (feature) j.

Value

a getEDmatrix object which consists of set1 vs set2 data frame of Euclidean distances, where the values in each row are sorted in ascending order. As a consequence columns have no meaning on their own. getEDmatrix also implicitly creates two variables, maxs and mins, which are automatically saved under such names and do not need explicitly variable assignment. They are created for later data scaling.

Examples

1
2
train <- matrix(1:9,nrow=3, ncol=3)
a <- getEDmatrix(train, train)

machLearnNA/RDN documentation built on May 21, 2019, 10:51 a.m.