addMatches | R Documentation |
Matches between query and target generic objects can be represented by
the Matched
object. By default, all data accessors work as
left joins between the query and the target object, i.e. values are
returned for each query object with eventual duplicated entries (values)
if the query object matches more than one target object. See also
Creation and subsetting as well as Extracting data sections below for
details and more information.
The Matched
object allows to represent matches between one-dimensional
query
and target
objects (being e.g. numeric
or list
),
two-dimensional objects (data.frame
or matrix
) or more complex
structures such as SummarizedExperiments
or QFeatures
. Combinations of
all these different data types are also supported. Matches are represented
between elements of one-dimensional objects, or rows for two-dimensional
objects (including SummarizedExperiment
or QFeatures
). For
QFeatures::QFeatures()
objects matches to only one of the assays
within the object is supported.
addMatches(object, ...)
endoapply(X, FUN, ...)
filterMatches(object, param, ...)
matchedData(object, ...)
queryVariables(object, ...)
targetVariables(object, ...)
Matched(
query = list(),
target = list(),
matches = data.frame(query_idx = integer(), target_idx = integer(), score = numeric()),
queryAssay = character(),
targetAssay = character(),
metadata = list()
)
## S4 method for signature 'Matched'
length(x)
## S4 method for signature 'Matched'
show(object)
## S4 method for signature 'Matched,ANY,ANY,ANY'
x[i, j, ..., drop = FALSE]
matches(object)
target(object)
## S4 method for signature 'Matched'
query(x, pattern, ...)
targetIndex(object)
queryIndex(object)
whichTarget(object)
whichQuery(object)
## S4 method for signature 'Matched'
x$name
## S4 method for signature 'Matched'
colnames(x)
scoreVariables(object)
## S4 method for signature 'Matched'
queryVariables(object)
## S4 method for signature 'Matched'
targetVariables(object)
## S4 method for signature 'Matched'
matchedData(object, columns = colnames(object), ...)
pruneTarget(object)
## S4 method for signature 'Matched,missing'
filterMatches(
object,
queryValue = integer(),
targetValue = integer(),
queryColname = character(),
targetColname = character(),
index = integer(),
keep = TRUE,
...
)
SelectMatchesParam(
queryValue = numeric(),
targetValue = numeric(),
queryColname = character(),
targetColname = character(),
index = integer(),
keep = TRUE
)
TopRankedMatchesParam(n = 1L, decreasing = FALSE)
ScoreThresholdParam(threshold = 0, above = FALSE, column = "score")
## S4 method for signature 'Matched,SelectMatchesParam'
filterMatches(object, param, ...)
## S4 method for signature 'Matched,TopRankedMatchesParam'
filterMatches(object, param, ...)
## S4 method for signature 'Matched,ScoreThresholdParam'
filterMatches(object, param, ...)
SingleMatchParam(
duplicates = c("remove", "closest", "top_ranked"),
column = "score",
decreasing = TRUE
)
## S4 method for signature 'Matched,SingleMatchParam'
filterMatches(object, param, ...)
## S4 method for signature 'Matched'
addMatches(
object,
queryValue = integer(),
targetValue = integer(),
queryColname = character(),
targetColname = character(),
score = rep(NA_real_, length(queryValue)),
isIndex = FALSE
)
## S4 method for signature 'ANY'
endoapply(X, FUN, ...)
## S4 method for signature 'Matched'
endoapply(X, FUN, ...)
## S4 method for signature 'Matched'
lapply(X, FUN, ...)
object |
a |
... |
additional parameters. |
X |
|
FUN |
for |
param |
for |
query |
object with the query elements. |
target |
object with the elements against which |
matches |
|
queryAssay |
|
targetAssay |
|
metadata |
|
x |
|
i |
|
j |
for |
drop |
for |
pattern |
for |
name |
for |
columns |
for |
queryValue |
for |
targetValue |
for |
queryColname |
for |
targetColname |
for |
index |
for |
keep |
for |
n |
for |
decreasing |
for |
threshold |
for |
above |
for |
column |
for |
duplicates |
for |
score |
for |
isIndex |
for |
See individual method description above for details.
Matched
object is returned as result from the matchValues()
function.
Alternatively, Matched
objects can also be created with the Matched
function providing the query
and target
objects as well as the matches
data.frame
with two columns of integer indices defining which elements
from query match which element from target.
addMatches
: add new matches to an existing object. Parameters
queryValue
and targetValue
allow to define which element(s) in
query
and target
should be considered matching. If isIndex = TRUE
,
both queryValue
and targetValue
are considered to be integer indices
identifying the matching elements in query
and target
, respectively.
Alternatively (with isIndex = FALSE
) queryValue
and targetValue
can
be elements in columns queryColname
or targetColname
which can be used
to identify the matching elements. Note that in this case
only the first matching pair is added. Parameter score
allows to
provide the score for the match. It can be a numeric with the score or a
data.frame
with additional information on the manually added matches. In
both cases its length (or number of rows) has to match the length of
queryValue
. See examples below for more information.
endoapply
: applies a user defined function FUN
to each subset of
matches in a Matched
object corresponding to a query
element (i.e. for
each x[i]
with i
being 1 to length(x)
). The results are then combined
in a single Matched
object representing updated matches. Note that FUN
has to return a Matched
object.
lapply
: applies a user defined function FUN
to each subset of
matches in a Matched
object for each query
element (i.e. to each x[i]
with i
from 1
to length(x)
). It returns a list
of length(object)
elements where each element is the output of FUN
applied to each subset
of matches.
[
: subset the object selecting query
object elements to keep with
parameter i
. The resulting object will contain all the matches
for the selected query elements. The target
object will by default be
returned as-is.
filterMatches
: filter matches in a Matched
object using different
approaches depending on the class of param
:
ScoreThresholdParam
: keeps only the matches whose score is strictly
above or strictly below a certain threshold (respectively when parameter
above = TRUE
and above = FALSE
). The name of the column containing
the scores to be used for the filtering can be specified with parameter
column
. The default for column
is "score"
. Such variable is present
in each Matched
object. The name of other score variables (if present)
can be provided (the names of all score variables can be obtained with
scoreVariables()
function). For example column = "score_rt"
can be
used to filter matches based on retention time scores for Matched
objects returned by matchValues()
when param
objects involving a
retention time comparison are used.
SelectMatchesParam
: keeps or removes (respectively when parameter
keep = TRUE
and keep = FALSE
) matches corresponding to certain
indices or values of query
and target
. If queryValue
and
targetValue
are provided, matches for these value pairs are kept or
removed. Parameter indexallows to filter matches providing their index in the [matches()] matrix. Note that
filterMatchesremoves only matches from the [matches()] matrix from the
Matchedobject but thus not alter the
queryor
target' in the object. See examples below for more
information.
SingleMatchParam
: reduces matches to keep only (at most) a
single match per query. The deduplication strategy can be defined with
parameter duplicates
:
duplicates = "remove"
: all matches for query elements matching more
than one target element will be removed.
duplicates = "closest"
: keep only the closest match for each
query element. The closest match is defined by the value(s) of
score (and eventually score_rt, if present). The one match with
the smallest value for this (these) column(s) is retained. This is
equivalent to TopRankedMatchesParam(n = 1L, decreasing = FALSE)
.
duplicates = "top_ranked"
: select the best ranking match for each
query element. Parameter column
allows to specify the column by
which matches are ranked (use targetVariables(object)
or
scoreVariables(object)
to list possible columns). Parameter
decreasing
allows to define whether the match with the highest
(decreasing = TRUE
) or lowest (decreasing = FALSE
) value in
column
for each query will be selected.
TopRankedMatchesParam
: for each query element the matches are ranked
according to their score and only the n
best of them are kept (if n
is larger than the number of matches for a given query element all the
matches are returned). For the ranking (ordering) R's rank
function is
used on the absolute values of the scores (variable "score"
), thus,
smaller score values (representing e.g. smaller differences between
expected and observed m/z values) are considered better. By
setting parameter decreasing = TRUE
matches can be ranked in decreasing
order (i.e. higher scores are ranked higher and are thus selected).
If besides variable "score"
also variable "score_rt"
is available in
the Matched
object (which is the case for the Matched
object
returned by matchValues()
for param
objects involving a retention
time comparison), the ordering of the matches is based on the product of
the ranks of the two variables (ranking of retention time differences
is performed on the absolute value of "score_rt"
). Thus, matches with
small (or, depending on parameter decreasing
, large) values for
"score"
and "score_rt"
are returned.
pruneTarget
: cleans the object by removing non-matched
target elements.
$
extracts a single variable from the Matched
x
. The variables that
can be extracted can be listed using colnames(x)
. These variables can
belong to query, target or be related to the matches (e.g. the
score of each match). If the query (target) object is two dimensional,
its columns can be extracted (prefix "target_"
is used for columns in the
target object) otherwise if query (target) has only a single
dimension (e.g. is a list
or a character
) the whole object can be
extracted with x$query
(x$target
). More precisely, when
query (target) is a SummarizedExperiment
the columns from
rowData(query)
(rowData(target
)) are extracted; when query (target)
is a QFeatures::QFeatures()
the columns from rowData
of the assay
specified in the queryAssay
(targetAssay
) slot are extracted.
The matching scores
are available as variable "score"
. Similar to a left join between the
query and target elements, this function returns a value for each query
element, with eventual duplicated values for query elements matching more
than one target element. If variables from the target data.frame
are
extracted, an NA
is reported for the entries corresponding to query
elements that don't match any target element. See examples below for
more details.
length
returns the number of query elements.
matchedData
allows to extract multiple variables contained in the
Matched
object as a DataFrame
. Parameter columns
allows to
define which columns (or variables) should be returned (defaults to
columns = colnames(object)
). Each single column in the returned
DataFrame
is constructed in the same way as in $
. That is, like $
,
this function performs a left join of variables from the query and
target objects returning all values for all query elements
(eventually returning duplicated elements for query elements matching
multiple target elements) and the values for the target elements matched
to the respective query elements (or NA
if the target element is not
matched to any query element).
matches
returns a data.frame
with the actual matching information with
columns "query_idx"
(index of the element in query
), "target_idx"
(index of the element in target
) "score"
(the score of the match) and
eventual additional columns.
target
returns the target object.
targetIndex
returns the indices of the matched targets in the order they
are assigned to the query elements. The length of the returned integer
vector is equal to the total number of matches in the object. targetIndex
and queryIndex
are aligned, i.e. each element in them represent a matched
query-target pair.
query
returns the query object.
queryIndex
returns the indices of the query elements with matches to
target elements. The length of the returned integer
vector is equal to
the total number of matches in the object. targetIndex
and queryIndex
are aligned, i.e. each element in them represent a matched query-target
pair.
queryVariables
returns the names of the variables (columns) in query.
scoreVariables
returns the names of the score variables stored in the
Matched
object (precisely the names of the variables in matches(object)
containing the string "score" in their name ignoring the case).
targetVariables
returns the names of the variables (columns) in target
(prefixed with "target_"
).
whichTarget
returns an integer
with the indices of the elements in
target that match at least one element in query.
whichQuery
returns an integer
with the indices of the elements in
query that match at least one element in target.
Andrea Vicini, Johannes Rainer
MatchedSpectra()
for matched Spectra::Spectra()
objects.
## Creating a `Matched` object.
q1 <- data.frame(col1 = 1:5, col2 = 6:10)
t1 <- data.frame(col1 = 11:16, col2 = 17:22)
## Define matches between query row 1 with target row 2 and, query row 2
## with target rows 2,3,4 and query row 5 with target row 5.
mo <- Matched(
q1, t1, matches = data.frame(query_idx = c(1L, 2L, 2L, 2L, 5L),
target_idx = c(2L, 2L, 3L, 4L, 5L),
score = seq(0.5, 0.9, by = 0.1)))
mo
## Which of the query elements (rows) match at least one target
## element (row)?
whichQuery(mo)
## Which target elements (rows) match at least one query element (row)?
whichTarget(mo)
## Extracting variable "col1" from query object .
mo$col1
## We have duplicated values for the entries of `col1` related to query
## elements (rows) matched to multiple rows of the target object). The
## value of `col1` is returned for each element (row) in the query.
## Extracting variable "col1" from target object. To access columns from
## target we have to prefix the name of the column by `"target_"`.
## Note that only values of `col1` for rows matching at least one query
## row are returned and an NA is reported for query rows without matching
## target rows.
mo$target_col1
## The 3rd and 4th query rows do not match any target row, thus `NA` is
## returned.
## `matchedData` can be used to extract all (or selected) columns
## from the object. Same as with `$`, a left join between the columns
## from the query and the target is performed. Below we extract selected
## columns from the object as a DataFrame.
res <- matchedData(mo, columns = c("col1", "col2", "target_col1",
"target_col2"))
res
res$col1
res$target_col1
## With the `queryIndex` and `targetIndex` it is possible to extract the
## indices of the matched query-target pairs:
queryIndex(mo)
targetIndex(mo)
## Hence, the first match is between the query with index 1 to the target
## with index 2, then, query with index 2 is matched to target with index 2
## and so on.
## The example matched object contains all query and all target
## elements (rows). Below we subset the object keeping only query rows that
## are matched to at least one target row.
mo_sub <- mo[whichQuery(mo)]
## mo_sub contains now only 3 query rows:
nrow(query(mo_sub))
## while the original object contains all 5 query rows:
nrow(query(mo))
## Both objects contain however still the full target object:
nrow(target(mo))
nrow(target(mo_sub))
## With the `pruneTarget` we can however reduce also the target rows to
## only those that match at least one query row
mo_sub <- pruneTarget(mo_sub)
nrow(target(mo_sub))
########
## Creating a `Matched` object with a `data.frame` for `query` and a `vector`
## for `target`. The matches are specified in the same way as the example
## before.
q1 <- data.frame(col1 = 1:5, col2 = 6:10)
t2 <- 11:16
mo <- Matched(q1, t2, matches = data.frame(query_idx = c(1L, 2L, 2L, 2L, 5L),
target_idx = c(2L, 2L, 3L, 4L, 5L), score = seq(0.5, 0.9, by = 0.1)))
## *target* is a simple vector and has thus no columns. The matched values
## from target, if it does not have dimensions and hence column names, can
## be retrieved with `$target`
mo$target
## Note that in this case "target" is returned by the function `colnames`
colnames(mo)
## As before, we can extract all data as a `DataFrame`
res <- matchedData(mo)
res
## Note that the columns of the obtained `DataFrame` are the same as the
## corresponding vectors obtained with `$`
res$col1
res$target
## Also subsetting and pruning works in the same way as the example above.
mo_sub <- mo[whichQuery(mo)]
## mo_sub contains now only 3 query rows:
nrow(query(mo_sub))
## while the original object contains all 5 query rows:
nrow(query(mo))
## Both object contain however still the full target object:
length(target(mo))
length(target(mo_sub))
## Reducing the target elements to only those that match at least one query
## row
mo_sub <- pruneTarget(mo_sub)
length(target(mo_sub))
########
## Filtering `Matched` with `filterMatches`
## Inspecting the matches in `mo`:
mo$col1
mo$target
## We have thus target *12* matched to both query elements with values 1 and
## 2, and query element 2 is matching 3 target elements. Let's assume we want
## to resolve this multiple mappings to keep from them only the match between
## query 1 (column `"col1"` containing value `1`) with target 1 (value `12`)
## and query 2 (column `"col1"` containing value `2`) with target 2 (value
## `13`). In addition we also want to keep query element 5 (value `5` in
## column `"col1"`) with the target with value `15`:
mo_sub <- filterMatches(mo,
SelectMatchesParam(queryValue = c(1, 2, 5), queryColname = "col1",
targetValue = c(12, 13, 15)))
matchedData(mo_sub)
## Alternatively to specifying the matches to filter with `queryValue` and
## `targetValue` it is also possible to specify directly the index of the
## match(es) in the `matches` `data.frame`:
matches(mo)
## To keep only matches like in the example above we could use:
mo_sub <- filterMatches(mo, SelectMatchesParam(index = c(1, 3, 5)))
matchedData(mo_sub)
## Note also that, instead of keeping the specified matches, it would be
## possible to remove them by setting `keep = FALSE`. Below we remove
## selected matches from the object:
mo_sub <- filterMatches(mo,
SelectMatchesParam(queryValue = c(2, 2), queryColname = "col1",
targetValue = c(12, 14), keep = FALSE))
mo_sub$col1
mo_sub$target
## As alternative to *manually* selecting matches it is also possible to
## filter matches keeping only the *best matches* using the
## `TopRankedMatchesParam`. This will rank matches for each query based on
## their *score* value and select the best *n* matches with lowest score
## values (i.e. smallest difference in m/z values).
mo_sub <- filterMatches(mo, TopRankedMatchesParam(n = 1L))
matchedData(mo_sub)
## Additionally it is possible to select matches based on a threshold
## for their *score*. Below we keep matches with score below 0.75 (one
## could select matches with *score* greater than the threshold by setting
## `ScoreThresholdParam` parameter `above = TRUE`.
mo_sub <- filterMatches(mo, ScoreThresholdParam(threshold = 0.75))
matchedData(mo_sub)
########
## Selecting the best match for each `query` element with `endoapply`
## It is also possible to select for each `query` element the match with the
## lowest score using `endoapply`. We manually define a function to select
## the best match for each query and give it as input to `endoapply`
## together with the `Matched` object itself. We obtain the same results as
## in the `filterMatches` example above.
FUN <- function(x) {
if(nrow(x@matches) > 1)
x@matches <- x@matches[order(x@matches$score)[1], , drop = FALSE]
x
}
mo_sub <- endoapply(mo, FUN)
matchedData(mo_sub)
########
## Adding matches using `addMatches`
## `addMatches` allows to manually add matches. Below we add a new match
## between the `query` element with a value of `1` in column `"col1"` and
## the target element with a value of `15`. Parameter `score` allows to
## assign a score value to the match.
mo_add <- addMatches(mo, queryValue = 1, queryColname = "col1",
targetValue = 15, score = 1.40)
matchedData(mo_add)
## Matches are always sorted by `query`, thus, the new match is listed as
## second match.
## Alternatively, we can also provide a `data.frame` with parameter `score`
## which enables us to add additional information to the added match. Below
## we define the score and an additional column specifying that this match
## was added manually. This information will then also be available in the
## `matchedData`.
mo_add <- addMatches(mo, queryValue = 1, queryColname = "col1",
targetValue = 15, score = data.frame(score = 5, manual = TRUE))
matchedData(mo_add)
## The match will get a score of NA if we're not providing any score.
mo_add <- addMatches(mo, queryValue = 1, queryColname = "col1",
targetValue = 15)
matchedData(mo_add)
## Creating a `Matched` object with a `SummarizedExperiment` for `query` and
## a `vector` for `target`. The matches are specified in the same way as
## the example before.
library(SummarizedExperiment)
q1 <- SummarizedExperiment(
assays = data.frame(matrix(NA, 5, 2)),
rowData = data.frame(col1 = 1:5, col2 = 6:10),
colData = data.frame(cD1 = c(NA, NA), cD2 = c(NA, NA)))
t1 <- data.frame(col1 = 11:16, col2 = 17:22)
## Define matches between row 1 in rowData(q1) with target row 2 and,
## rowData(q1) row 2 with target rows 2,3,4 and rowData(q1) row 5 with target
## row 5.
mo <- Matched(
q1, t1, matches = data.frame(query_idx = c(1L, 2L, 2L, 2L, 5L),
target_idx = c(2L, 2L, 3L, 4L, 5L),
score = seq(0.5, 0.9, by = 0.1)))
mo
## Which of the query elements (rows) match at least one target
## element (row)?
whichQuery(mo)
## Which target elements (rows) match at least one query element (row)?
whichTarget(mo)
## Extracting variable "col1" from rowData(q1).
mo$col1
## We have duplicated values for the entries of `col1` related to rows of
## rowData(q1) matched to multiple rows of the target data.frame t1. The
## value of `col1` is returned for each row in the rowData of query.
## Extracting variable "col1" from target object. To access columns from
## target we have to prefix the name of the column by `"target_"`.
## Note that only values of `col1` for rows matching at least one row in
## rowData of query are returned and an NA is reported for those without
## matching target rows.
mo$target_col1
## The 3rd and 4th query rows do not match any target row, thus `NA` is
## returned.
## `matchedData` can be used to extract all (or selected) columns
## from the object. Same as with `$`, a left join between the columns
## from the query and the target is performed. Below we extract selected
## columns from the object as a DataFrame.
res <- matchedData(mo, columns = c("col1", "col2", "target_col1",
"target_col2"))
res
res$col1
res$target_col1
## The example `Matched` object contains all rows in the
## `rowData` of the `SummarizedExperiment` and all target rows. Below we
## subset the object keeping only rows that are matched to at least one
## target row.
mo_sub <- mo[whichQuery(mo)]
## mo_sub contains now a `SummarizedExperiment` with only 3 rows:
nrow(query(mo_sub))
## while the original object contains a `SummarizedExperiment` with all 5
## rows:
nrow(query(mo))
## Both objects contain however still the full target object:
nrow(target(mo))
nrow(target(mo_sub))
## With the `pruneTarget` we can however reduce also the target rows to
## only those that match at least one in the `rowData` of query
mo_sub <- pruneTarget(mo_sub)
nrow(target(mo_sub))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.