These functions implement several similarity and distance measures for R functions
(i.e. their body expressions).
TODO check and document measure-theoretic properties of each measure defined here
TODO these distance measures are metrics, some of them are norm-induced metrics
commonSubexpressions returns the set of common subexpressions of expr1
and expr2. This is not a metric by itself, but can be used to implement
several subtree-based similarity metrics.
of expr1 and expr2.
sizeWeightedNumberOfcommonSubexpressions returns the number of common
subexpressions of expr1 and expr2, weighting the size of each common
subexpression. Note that for every expression e,
sizeWeightedNumberOfcommonSubexpressions( e , e
) == exprVisitationLength( e ).
normalizedNumberOfCommonSubexpressions returns the ratio of the number of
common subexpressions of expr1 and expr2 in relation to the number
of subexpression in the larger expression of expr1 and expr2.
normalizedSizeWeightedNumberOfcommonSubexpressions returns the ratio of
the size-weighted number of common subexpressions of expr1 and expr2
in relation to the visitation length of the larger expression of expr1 and
expr2.
NCSdist and SNCSdist are distance metrics derived from
normalizedNumberOfCommonSubexpressions and
normalizedSizeWeightedNumberOfCommonSubexpressions respectively.
differingSubexpressions, and codenumberOfDifferingSubexpressions
are duals of the functions described above, based on counting the number of
differing subexpressions of expr1 and expr2. The possible functions
"normalizedNumberOfDifferingSubexpressions" and
"normalizedSizeWeightedNumberOfDifferingSubexpressions" where ommited because they
are always equal to NCSdist and SNCSdist by definition.
trivialMetric The "trivial" metric M(a, b) that is 0 iff a == b, 1 otherwise.
normInducedTreeDistance Uses a norm on expression trees and a metric on tree
node labels to induce a metric M on expression trees A and B: If both A and B are empty
(represented as NULL), M(A, B) := 0. If exactly one of A or B is empty, M(A, B) :=
"the norm applied to the non-empty tree". If neither A or B is empty, the difference
of their root node labels (as measured by labelDistance) is added to the sum of
the differences of the children. The children lists are padded with empty trees to
equalize their sizes. The summation operator can be changed via distanceFoldOperator.
normInducedFunctionDistance Is wrapper that applies normInducedTreeDistance
to the bodies of the given functions.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | commonSubexpressions(expr1, expr2)
numberOfCommonSubexpressions(expr1, expr2)
normalizedNumberOfCommonSubexpressions(expr1, expr2)
NCSdist(expr1, expr2)
sizeWeightedNumberOfCommonSubexpressions(expr1, expr2)
normalizedSizeWeightedNumberOfCommonSubexpressions(expr1, expr2)
SNCSdist(expr1, expr2)
differingSubexpressions(expr1, expr2)
numberOfDifferingSubexpressions(expr1, expr2)
sizeWeightedNumberOfDifferingSubexpressions(expr1, expr2)
trivialMetric(a, b)
normInducedTreeDistance(norm, labelDistance = trivialMetric,
distanceFoldOperator = NULL)
normInducedFunctionDistance(norm, labelDistance = trivialMetric,
distanceFoldOperator = NULL)
|
expr1 |
An R expression. |
expr2 |
An R expression. |
a |
An R object. |
b |
An R object. |
norm |
A norm to derive a tree distance metric from. |
labelDistance |
A metric for measuring distances of tree node labels, i.e. function names or constants. |
distanceFoldOperator |
The operator used by |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.