These functions implement several similarity and distance measures for R functions
(i.e. their body expressions).
TODO check and document measure-theoretic properties of each measure defined here
TODO these distance measures are metrics, some of them are norm-induced metrics
commonSubexpressions
returns the set of common subexpressions of expr1
and expr2
. This is not a metric by itself, but can be used to implement
several subtree-based similarity metrics.
of expr1
and expr2
.
sizeWeightedNumberOfcommonSubexpressions
returns the number of common
subexpressions of expr1
and expr2
, weighting the size of each common
subexpression. Note that for every expression e,
sizeWeightedNumberOfcommonSubexpressions(
e ,
e
) == exprVisitationLength(
e )
.
normalizedNumberOfCommonSubexpressions
returns the ratio of the number of
common subexpressions of expr1
and expr2
in relation to the number
of subexpression in the larger expression of expr1
and expr2
.
normalizedSizeWeightedNumberOfcommonSubexpressions
returns the ratio of
the size-weighted number of common subexpressions of expr1
and expr2
in relation to the visitation length of the larger expression of expr1
and
expr2
.
NCSdist
and SNCSdist
are distance metrics derived from
normalizedNumberOfCommonSubexpressions
and
normalizedSizeWeightedNumberOfCommonSubexpressions
respectively.
differingSubexpressions
, and codenumberOfDifferingSubexpressions
are duals of the functions described above, based on counting the number of
differing subexpressions of expr1
and expr2
. The possible functions
"normalizedNumberOfDifferingSubexpressions" and
"normalizedSizeWeightedNumberOfDifferingSubexpressions" where ommited because they
are always equal to NCSdist
and SNCSdist
by definition.
trivialMetric
The "trivial" metric M(a, b) that is 0 iff a == b, 1 otherwise.
normInducedTreeDistance
Uses a norm on expression trees and a metric on tree
node labels to induce a metric M on expression trees A and B: If both A and B are empty
(represented as NULL
), M(A, B) := 0. If exactly one of A or B is empty, M(A, B) :=
"the norm applied to the non-empty tree". If neither A or B is empty, the difference
of their root node labels (as measured by labelDistance
) is added to the sum of
the differences of the children. The children lists are padded with empty trees to
equalize their sizes. The summation operator can be changed via distanceFoldOperator
.
normInducedFunctionDistance
Is wrapper that applies normInducedTreeDistance
to the bodies of the given functions.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | commonSubexpressions(expr1, expr2)
numberOfCommonSubexpressions(expr1, expr2)
normalizedNumberOfCommonSubexpressions(expr1, expr2)
NCSdist(expr1, expr2)
sizeWeightedNumberOfCommonSubexpressions(expr1, expr2)
normalizedSizeWeightedNumberOfCommonSubexpressions(expr1, expr2)
SNCSdist(expr1, expr2)
differingSubexpressions(expr1, expr2)
numberOfDifferingSubexpressions(expr1, expr2)
sizeWeightedNumberOfDifferingSubexpressions(expr1, expr2)
trivialMetric(a, b)
normInducedTreeDistance(norm, labelDistance = trivialMetric,
distanceFoldOperator = NULL)
normInducedFunctionDistance(norm, labelDistance = trivialMetric,
distanceFoldOperator = NULL)
|
expr1 |
An R expression. |
expr2 |
An R expression. |
a |
An R object. |
b |
An R object. |
norm |
A norm to derive a tree distance metric from. |
labelDistance |
A metric for measuring distances of tree node labels, i.e. function names or constants. |
distanceFoldOperator |
The operator used by |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.