R/Matched.R

Defines functions .addMatches SingleMatchParam ScoreThresholdParam TopRankedMatchesParam SelectMatchesParam .findMatchesIdxs pruneTarget .objectToMatch .matchedData .dollar .colnames .cnt .validate_assay .validate_qt .validate_matches_content .validate_matches_format .fill_index .subset_qt .subset_matches_nodim .extract_elements .nelements scoreVariables whichQuery whichTarget queryIndex targetIndex target matches .nelements_t Matched

Documented in Matched matches pruneTarget queryIndex ScoreThresholdParam scoreVariables SelectMatchesParam SingleMatchParam target targetIndex TopRankedMatchesParam whichQuery whichTarget

#' @title Representation of generic objects matches
#'
#' @name Matched
#'
#' @aliases Matched Matched-class [,Matched-method
#'
#' @description
#'
#' Matches between *query* and *target* generic objects can be represented by
#' the `Matched` object. By default, all data accessors work as
#' *left joins* between the *query* and the *target* object, i.e. values are
#' returned for each *query* object with eventual duplicated entries (values)
#' if the *query* object matches more than one *target* object. See also
#' *Creation and subsetting* as well as *Extracting data* sections below for
#' details and more information.
#'
#' The `Matched` object allows to represent matches between one-dimensional
#' `query` and `target` objects (being e.g. `numeric` or `list`),
#' two-dimensional objects (`data.frame` or `matrix`) or more complex
#' structures such as `SummarizedExperiments` or `QFeatures`. Combinations of
#' all these different data types are also supported. Matches are represented
#' between elements of one-dimensional objects, or rows for two-dimensional
#' objects (including `SummarizedExperiment` or `QFeatures`). For [QFeatures()]
#' objects matches to only one of the *assays* within the object is supported.
#'
#' @section Creation and general handling:
#'
#' `Matched` object is returned as result from the [matchValues()] function.
#'
#' Alternatively, `Matched` objects can also be created with the `Matched`
#' function providing the `query` and `target` objects as well as the `matches`
#' `data.frame` with two columns of integer indices defining which elements
#' from *query* match which element from *target*.
#'
#' - `addMatches`: add new matches to an existing object. Parameters
#'   `queryValue` and `targetValue` allow to define which element(s) in
#'   `query` and `target` should be considered matching. If `isIndex = TRUE`,
#'   both `queryValue` and `targetValue` are considered to be integer indices
#'   identifying the matching elements in `query` and `target`, respectively.
#'   Alternatively (with `isIndex = FALSE`) `queryValue` and `targetValue` can
#'   be elements in columns `queryColname` or `targetColname` which can be used
#'   to identify the matching elements. Note that in this case
#'   **only the first** matching pair is added. Parameter `score` allows to
#'   provide the score for the match. It can be a numeric with the score or a
#'   `data.frame` with additional information on the manually added matches. In
#'   both cases its length (or number of rows) has to match the length of
#'   `queryValue`. See examples below for more information.
#'
#' - `endoapply`: applies a user defined function `FUN` to each subset of
#'   matches in a `Matched` object corresponding to a `query` element (i.e. for
#'   each `x[i]` with `i` being 1 to `length(x)`). The results are then combined
#'   in a single `Matched` object representing updated matches. Note that `FUN`
#'   has to return a `Matched` object.
#'
#' - `lapply`: applies a user defined function `FUN` to each subset of
#'   matches in a `Matched` object for each `query` element (i.e. to each `x[i]`
#'   with `i` from `1` to `length(x)`). It returns a `list` of `length(object)`
#'   elements where each element is the output of `FUN` applied to each subset
#'   of matches.
#'
#'
#' @section Filtering and subsetting:
#'
#' - `[`: subset the object selecting `query` object elements to keep with
#'   parameter `i`. The resulting object will contain all the matches
#'   for the selected query elements. The `target` object will by default be
#'   returned as-is.
#'
#' - `filterMatches`: filter matches in a `Matched` object using different
#'    approaches depending on the class of `param`:
#'
#'   - `ScoreThresholdParam`: keeps only the matches whose score is strictly
#'     above or strictly below a certain threshold (respectively when parameter
#'     `above = TRUE` and `above = FALSE`). The name of the column containing
#'     the scores to be used for the filtering can be specified with parameter
#'     `column`. The default for `column` is `"score"`. Such variable is present
#'     in each `Matched` object. The name of other score variables (if present)
#'     can be provided (the names of all score variables can be obtained with
#'     `scoreVariables()` function). For example `column = "score_rt"` can be
#'     used to filter matches based on retention time scores for `Matched`
#'     objects returned by [matchValues()] when `param` objects involving a
#'     retention time comparison are used.
#'
#'   - `SelectMatchesParam`: keeps or removes (respectively when parameter
#'     `keep = TRUE` and `keep = FALSE`) matches corresponding to certain
#'     indices or values of `query` and `target`. If `queryValue` and
#'     `targetValue` are provided, matches for these value pairs are kept or
#'     removed. Parameter index` allows to filter matches providing their index
#'     in the [matches()] matrix. Note that `filterMatches` removes only matches
#'     from the [matches()] matrix from the `Matched` object but thus not alter
#'     the `query` or `target` in the object. See examples below for more
#'     information.
#'
#'   - `SingleMatchParam`: reduces matches to keep only (at most) a
#'     single match per query. The deduplication strategy can be defined with
#'     parameter `duplicates`:
#'     - `duplicates = "remove"`: all matches for query elements matching more
#'       than one target element will be removed.
#'     - `duplicates = "closest"`: keep only the *closest* match for each
#'       query element. The closest match is defined by the value(s) of
#'       *score* (and eventually *score_rt*, if present). The one match with
#'       the smallest value for this (these) column(s) is retained. This is
#'       equivalent to `TopRankedMatchesParam(n = 1L, decreasing = FALSE)`.
#'     - `duplicates = "top_ranked"`: select the *best ranking* match for each
#'       query element. Parameter `column` allows to specify the column by
#'       which matches are ranked (use `targetVariables(object)` or
#'       `scoreVariables(object)` to list possible columns). Parameter
#'       `decreasing` allows to define whether the match with the highest
#'       (`decreasing = TRUE`) or lowest (`decreasing = FALSE`) value in
#'       `column` for each *query* will be selected.
#'
#'   - `TopRankedMatchesParam`: for each query element the matches are ranked
#'     according to their score and only the `n` best of them are kept (if `n`
#'     is larger than the number of matches for a given query element all the
#'     matches are returned). For the ranking (ordering) R's `rank` function is
#'     used on the absolute values of the scores (variable `"score"`), thus,
#'     smaller score values (representing e.g. smaller differences between
#'     expected and observed m/z values) are considered *better*. By
#'     setting parameter `decreasing = TRUE` matches can be ranked in decreasing
#'     order (i.e. higher scores are ranked higher and are thus selected).
#'     If besides variable `"score"` also variable `"score_rt"` is available in
#'     the `Matched` object (which is the case for the `Matched` object
#'     returned by [matchValues()] for `param` objects involving a retention
#'     time comparison), the ordering of the matches is based on the product of
#'     the ranks of the two variables (ranking of retention time differences
#'     is performed on the absolute value of `"score_rt"`). Thus, matches with
#'     small (or, depending on parameter `decreasing`, large) values for
#'     `"score"` **and** `"score_rt"` are returned.
#'
#' - `pruneTarget`: *cleans* the object by removing non-matched
#'   **target** elements.
#'
#' @section Extracting data:
#'
#' - `$` extracts a single variable from the `Matched` `x`. The variables that
#'   can be extracted can be listed using `colnames(x)`. These variables can
#'   belong to *query*, *target* or be related to the matches (e.g. the
#'   score of each match). If the *query* (*target*) object is two dimensional,
#'   its columns can be extracted (prefix` "target_"` is used for columns in the
#'   *target* object) otherwise if *query* (*target*) has only a single
#'   dimension (e.g. is a `list` or a `character`) the whole object can be
#'   extracted with `x$query` (`x$target`). More precisely, when
#'   *query* (*target*) is a `SummarizedExperiment` the columns from
#'   `rowData(query)` (rowData(`target`)) are extracted; when *query* (*target*)
#'   is a [QFeatures()] the columns from `rowData` of the assay specified in the
#'   `queryAssay` (`targetAssay`) slot are extracted. The matching scores
#'   are available as *variable* `"score"`. Similar to a left join between the
#'   query and target elements, this function returns a value for each query
#'   element, with eventual duplicated values for query elements matching more
#'   than one target element. If variables from the target `data.frame` are
#'   extracted, an `NA` is reported for the entries corresponding to *query*
#'   elements that don't match any target element. See examples below for
#'   more details.
#'
#' - `length` returns the number of **query** elements.
#'
#' - `matchedData` allows to extract multiple variables contained in the
#'   `Matched` object as a `DataFrame`. Parameter `columns` allows to
#'   define which columns (or variables) should be returned (defaults to
#'   `columns = colnames(object)`). Each single column in the returned
#'   `DataFrame` is constructed in the same way as in `$`. That is, like `$`,
#'   this function performs a *left join* of variables from the *query* and
#'   *target* objects returning all values for all *query* elements
#'   (eventually returning duplicated elements for query elements matching
#'   multiple target elements) and the values for the target elements matched
#'   to the respective query elements (or `NA` if the target element is not
#'   matched to any query element).
#'
#' - `matches` returns a `data.frame` with the actual matching information with
#'   columns `"query_idx"` (index of the element in `query`), `"target_idx"`
#'   (index of the element in `target`) `"score"` (the score of the match) and
#'   eventual additional columns.
#'
#' - `target` returns the *target* object.
#'
#' - `targetIndex` returns the indices of the matched targets in the order they
#'   are assigned to the query elements. The length of the returned `integer`
#'   vector is equal to the total number of matches in the object. `targetIndex`
#'   and `queryIndex` are aligned, i.e. each element in them represent a matched
#'   query-target pair.
#'
#' - `query` returns the *query* object.
#'
#' - `queryIndex` returns the indices of the query elements with matches to
#'   target elements. The length of the returned `integer` vector is equal to
#'   the total number of matches in the object. `targetIndex` and `queryIndex`
#'   are aligned, i.e. each element in them represent a matched query-target
#'   pair.
#'
#' - `queryVariables` returns the names of the variables (columns) in *query*.
#'
#' - `scoreVariables` returns the names of the score variables stored in the
#'   `Matched` object (precisely the names of the variables in `matches(object)`
#'   containing the string "score" in their name ignoring the case).
#'
#' - `targetVariables` returns the names of the variables (columns) in *target*
#'   (prefixed with `"target_"`).
#'
#' - `whichTarget` returns an `integer` with the indices of the elements in
#'   *target* that match at least one element in *query*.
#'
#' - `whichQuery` returns an `integer` with the indices of the elements in
#'   *query* that match at least one element in *target*.
#'
#' @param above for `ScoreThresholdParam` : `logical(1)` specifying whether
#'   to keep matches above (`above = TRUE`) or below (`above = FALSE`) a certain
#'   threshold.
#'
#' @param column for `ScoreThresholdParam`: `character(1)` specifying the name
#'   of the score variable to consider for the filtering (the default is
#'   `column = "score"`). For `SingleMatchParam`: `character(1)` defining the
#'   name of the column to be used for de-duplication. See description of
#'   `SingleMatchParam` in the *Filtering and subsetting* section for details.
#'
#' @param columns for `matchedData`: `character` vector with column names of
#'   variables that should be extracted.
#'
#' @param decreasing for `TopRankedMatchesParam`: `logical(1)` whether scores
#'   should be ordered increasing or decreasing. Defaults to
#'   `decreasing = FALSE`.
#'
#' @param drop for `[`: ignored.
#'
#' @param duplicates for `SingleMatchParam`: `character(1)` defining the
#'   *de-duplication* strategy. See the description of `SingleMatchParam` in
#'   the *Filtering and subsetting* subsection for choices and details.
#'
#' @param FUN for `lapply` and `endoapply`: user defined `function` that takes a
#'   `Matched` object as a first parameter and possibly additional parameters
#'   (that need to be provided in the `lapply` or `endoapply` call. For lapply
#'   `FUN` can return any object while for `endoapply` it must return a
#'   `Matched` object.
#'
#' @param i `integer` or `logical` defining the `query` elements to keep.
#'
#' @param index for `SelectMatchesParam`: indices of the matches to keep (if
#'   `keep = TRUE`) or to drop if (`keep = FALSE`).
#'
#' @param isIndex for `addMatches`: specifies if `queryValue` and
#'   `targetValue` are expected to be vectors of indices.
#'
#' @param j for `[`: ignored.
#'
#' @param keep for `SelectMatchesParam`: `logical`. If `keep = TRUE` the matches
#'   are kept, if `keep = FALSE` they are removed.
#'
#' @param matches `data.frame` with columns `"query_idx"` (`integer`),
#'   `"target_idx"` (`integer`) and `"score"` (`numeric`) representing the n:m
#'   mapping of elements between the `query` and the `target` objects.
#'
#' @param metadata `list` with optional additional metadata.
#'
#' @param n for `TopRankedMatchesParam`: `integer(1)` with number of best
#'   ranked matches to keep for each `query` element.
#'
#' @param name for `$`: the name of the column (or variable) to extract.
#'
#' @param object a `Matched` object.
#'
#' @param param for `filterMatches`: parameter object to select and customize
#'   the filtering procedure.
#'
#' @param pattern for `query`: ignored.
#'
#' @param score for `addMatches`: `numeric` (same length than `queryValue`) or
#'   `data.frame` (same number of rows than `queryValue`) specifying the scores
#'   for the matches to add. If not specified, a `NA` will be used as score.
#'
#' @param target object with the elements against which `query` has been
#'   matched.
#'
#' @param targetAssay `character` that needs to be specified when `target` is
#'   a `QFeatures`. In this case, `targetAssay` is expected to be the name of
#'   one of the assays in `target` (the one on which the matching was
#'   performed).
#'
#' @param targetColname for `SelectMatchesParam`: if `query` is 2-dimensional it
#'   represents the column of `target` against which elements of `targetValue`
#'   are compared.
#'
#' @param targetValue for `SelectMatchesParam`: vector of values to search for
#'   in `target` (if `target` is 1-dimensional) or in column `targetColname` of
#'   `target` (if `target` is 2-dimensional). For `addMatches`: either an
#'   index in `target` or value in column `targetColname` of `target` defining
#'   (together with `queryValue`) the pair of query and target elements for
#'   which a match should be manually added. Lengths of `queryValue` and
#'   `targetValue` have to match.
#'
#' @param threshold for `ScoreThresholdParam` : `numeric(1)` specifying the
#'   threshold to consider for the filtering.
#'
#' @param query object with the query elements.
#'
#' @param queryAssay `character` that needs to be specified when `query` is
#'   a `QFeatures`. In this case, `queryAssay` is expected to be the name of
#'   one of the assays in `query` (the one on which the matching was performed).
#'
#' @param queryColname for `SelectMatchesParam`: if `query` is 2-dimensional it
#'   represents the column of `query` against which elements of `queryValue`
#'   are compared.
#'
#' @param queryValue for `SelectMatchesParam`: vector of values to search for in
#'   `query` (if `query` is 1-dimensional) or in column `queryColname` of
#'   `query` (if `query` is 2-dimensional). For `addMatches`: either an index
#'   in `query` or value in column `queryColname` of `query` defining (together
#'   with `targetValue`) the pair of query and target elements for which a
#'   match should be manually added. Lengths of `queryValue` and
#'   `targetValue` have to match.
#'
#' @param x `Matched` object.
#'
#' @param X `Matched` object.
#'
#' @param ... additional parameters.
#'
#' @return See individual method description above for details.
#'
#' @seealso [MatchedSpectra()] for matched [Spectra()] objects.
#'
#' @exportClass Matched
#'
#' @author Andrea Vicini, Johannes Rainer
#'
#' @rdname Matched
#'
#' @examples
#'
#' ## Creating a `Matched` object.
#' q1 <- data.frame(col1 = 1:5, col2 = 6:10)
#' t1 <- data.frame(col1 = 11:16, col2 = 17:22)
#' ## Define matches between query row 1 with target row 2 and, query row 2
#' ## with target rows 2,3,4 and query row 5 with target row 5.
#' mo <- Matched(
#'     q1, t1, matches = data.frame(query_idx = c(1L, 2L, 2L, 2L, 5L),
#'                                  target_idx = c(2L, 2L, 3L, 4L, 5L),
#'                                  score = seq(0.5, 0.9, by = 0.1)))
#' mo
#'
#' ## Which of the query elements (rows) match at least one target
#' ## element (row)?
#' whichQuery(mo)
#'
#' ## Which target elements (rows) match at least one query element (row)?
#' whichTarget(mo)
#'
#' ## Extracting variable "col1" from query object .
#' mo$col1
#'
#' ## We have duplicated values for the entries of `col1` related to query
#' ## elements (rows) matched to multiple rows of the target object). The
#' ## value of `col1` is returned for each element (row) in the query.
#'
#' ## Extracting variable "col1" from target object. To access columns from
#' ## target we have to prefix the name of the column by `"target_"`.
#' ## Note that only values of `col1` for rows matching at least one query
#' ## row are returned and an NA is reported for query rows without matching
#' ## target rows.
#' mo$target_col1
#'
#' ## The 3rd and 4th query rows do not match any target row, thus `NA` is
#' ## returned.
#'
#' ## `matchedData` can be used to extract all (or selected) columns
#' ## from the object. Same as with `$`, a left join between the columns
#' ## from the query and the target is performed. Below we extract selected
#' ## columns from the object as a DataFrame.
#' res <- matchedData(mo, columns = c("col1", "col2", "target_col1",
#'                                    "target_col2"))
#' res
#' res$col1
#' res$target_col1
#'
#' ## With the `queryIndex` and `targetIndex` it is possible to extract the
#' ## indices of the matched query-target pairs:
#' queryIndex(mo)
#' targetIndex(mo)
#'
#' ## Hence, the first match is between the query with index 1 to the target
#' ## with index 2, then, query with index 2 is matched to target with index 2
#' ## and so on.
#'
#' ## The example matched object contains all query and all target
#' ## elements (rows). Below we subset the object keeping only query rows that
#' ## are matched to at least one target row.
#' mo_sub <- mo[whichQuery(mo)]
#'
#' ## mo_sub contains now only 3 query rows:
#' nrow(query(mo_sub))
#'
#' ## while the original object contains all 5 query rows:
#' nrow(query(mo))
#'
#' ## Both objects contain however still the full target object:
#' nrow(target(mo))
#' nrow(target(mo_sub))
#'
#' ## With the `pruneTarget` we can however reduce also the target rows to
#' ## only those that match at least one query row
#' mo_sub <- pruneTarget(mo_sub)
#' nrow(target(mo_sub))
#'
#' ########
#' ## Creating a `Matched` object with a `data.frame` for `query` and a `vector`
#' ## for `target`. The matches are specified in the same way as the example
#' ## before.
#'
#' q1 <- data.frame(col1 = 1:5, col2 = 6:10)
#' t2 <- 11:16
#' mo <- Matched(q1, t2, matches = data.frame(query_idx = c(1L, 2L, 2L, 2L, 5L),
#'     target_idx = c(2L, 2L, 3L, 4L, 5L), score = seq(0.5, 0.9, by = 0.1)))
#'
#' ## *target* is a simple vector and has thus no columns. The matched values
#' ## from target, if it does not have dimensions and hence column names, can
#' ## be retrieved with `$target`
#' mo$target
#'
#' ## Note that in this case "target" is returned by the function `colnames`
#' colnames(mo)
#'
#' ## As before, we can extract all data as a `DataFrame`
#' res <- matchedData(mo)
#' res
#'
#' ## Note that the columns of the obtained `DataFrame` are the same as the
#' ## corresponding vectors obtained with `$`
#' res$col1
#' res$target
#'
#' ## Also subsetting and pruning works in the same way as the example above.
#'
#' mo_sub <- mo[whichQuery(mo)]
#'
#' ## mo_sub contains now only 3 query rows:
#' nrow(query(mo_sub))
#'
#' ## while the original object contains all 5 query rows:
#' nrow(query(mo))
#'
#' ## Both object contain however still the full target object:
#' length(target(mo))
#' length(target(mo_sub))
#'
#' ## Reducing the target elements to only those that match at least one query
#' ## row
#' mo_sub <- pruneTarget(mo_sub)
#' length(target(mo_sub))
#'
#' ########
#' ## Filtering `Matched` with `filterMatches`
#'
#' ## Inspecting the matches in `mo`:
#' mo$col1
#' mo$target
#'
#' ## We have thus target *12* matched to both query elements with values 1 and
#' ## 2, and query element 2 is matching 3 target elements. Let's assume we want
#' ## to resolve this multiple mappings to keep from them only the match between
#' ## query 1 (column `"col1"` containing value `1`) with target 1 (value `12`)
#' ## and query 2 (column `"col1"` containing value `2`) with target 2 (value
#' ## `13`). In addition we also want to keep query element 5 (value `5` in
#' ## column `"col1"`) with the target with value `15`:
#' mo_sub <- filterMatches(mo,
#'     SelectMatchesParam(queryValue = c(1, 2, 5), queryColname = "col1",
#'                        targetValue = c(12, 13, 15)))
#' matchedData(mo_sub)
#'
#' ## Alternatively to specifying the matches to filter with `queryValue` and
#' ## `targetValue` it is also possible to specify directly the index of the
#' ## match(es) in the `matches` `data.frame`:
#' matches(mo)
#'
#' ## To keep only matches like in the example above we could use:
#' mo_sub <- filterMatches(mo, SelectMatchesParam(index = c(1, 3, 5)))
#' matchedData(mo_sub)
#'
#' ## Note also that, instead of keeping the specified matches, it would be
#' ## possible to remove them by setting `keep = FALSE`. Below we remove
#' ## selected matches from the object:
#' mo_sub <- filterMatches(mo,
#'     SelectMatchesParam(queryValue = c(2, 2), queryColname = "col1",
#'                        targetValue = c(12, 14), keep = FALSE))
#' mo_sub$col1
#' mo_sub$target
#'
#' ## As alternative to *manually* selecting matches it is also possible to
#' ## filter matches keeping only the *best matches* using the
#' ## `TopRankedMatchesParam`. This will rank matches for each query based on
#' ## their *score* value and select the best *n* matches with lowest score
#' ## values (i.e. smallest difference in m/z values).
#' mo_sub <- filterMatches(mo, TopRankedMatchesParam(n = 1L))
#' matchedData(mo_sub)
#'
#' ## Additionally it is possible to select matches based on a threshold
#' ## for their *score*. Below we keep matches with score below 0.75 (one
#' ## could select matches with *score* greater than the threshold by setting
#' ## `ScoreThresholdParam` parameter `above = TRUE`.
#' mo_sub <- filterMatches(mo, ScoreThresholdParam(threshold = 0.75))
#' matchedData(mo_sub)
#'
#' ########
#' ## Selecting the best match for each `query` element with `endoapply`
#'
#' ## It is also possible to select for each `query` element the match with the
#' ## lowest score using `endoapply`. We manually define a function to select
#' ## the best match for each query and give it as input to `endoapply`
#' ## together with the `Matched` object itself. We obtain the same results as
#' ## in the `filterMatches` example above.
#'
#' FUN <- function(x) {
#'     if(nrow(x@matches) > 1)
#'         x@matches <- x@matches[order(x@matches$score)[1], , drop = FALSE]
#'     x
#' }
#'
#' mo_sub <- endoapply(mo, FUN)
#' matchedData(mo_sub)
#'
#' ########
#' ## Adding matches using `addMatches`
#'
#' ## `addMatches` allows to manually add matches. Below we add a new match
#' ## between the `query` element with a value of `1` in column `"col1"` and
#' ## the target element with a value of `15`. Parameter `score` allows to
#' ## assign a score value to the match.
#' mo_add <- addMatches(mo, queryValue = 1, queryColname = "col1",
#'     targetValue = 15, score = 1.40)
#' matchedData(mo_add)
#' ## Matches are always sorted by `query`, thus, the new match is listed as
#' ## second match.
#'
#' ## Alternatively, we can also provide a `data.frame` with parameter `score`
#' ## which enables us to add additional information to the added match. Below
#' ## we define the score and an additional column specifying that this match
#' ## was added manually. This information will then also be available in the
#' ## `matchedData`.
#' mo_add <- addMatches(mo, queryValue = 1, queryColname = "col1",
#'     targetValue = 15, score = data.frame(score = 5, manual = TRUE))
#' matchedData(mo_add)
#'
#' ## The match will get a score of NA if we're not providing any score.
#' mo_add <- addMatches(mo, queryValue = 1, queryColname = "col1",
#'     targetValue = 15)
#' matchedData(mo_add)
#'
#' ## Creating a `Matched` object with a `SummarizedExperiment` for `query` and
#' ## a `vector` for `target`. The matches are specified in the same way as
#' ## the example before.
#' library(SummarizedExperiment)
#' q1 <- SummarizedExperiment(
#'   assays = data.frame(matrix(NA, 5, 2)),
#'   rowData = data.frame(col1 = 1:5, col2 = 6:10),
#'   colData = data.frame(cD1 = c(NA, NA), cD2 = c(NA, NA)))
#' t1 <- data.frame(col1 = 11:16, col2 = 17:22)
#' ## Define matches between row 1 in rowData(q1) with target row 2 and,
#' ## rowData(q1) row 2 with target rows 2,3,4 and rowData(q1) row 5 with target
#' ## row 5.
#' mo <- Matched(
#'     q1, t1, matches = data.frame(query_idx = c(1L, 2L, 2L, 2L, 5L),
#'                                 target_idx = c(2L, 2L, 3L, 4L, 5L),
#'                                  score = seq(0.5, 0.9, by = 0.1)))
#' mo
#'
#' ## Which of the query elements (rows) match at least one target
#' ## element (row)?
#' whichQuery(mo)
#'
#' ## Which target elements (rows) match at least one query element (row)?
#' whichTarget(mo)
#'
#' ## Extracting variable "col1" from rowData(q1).
#' mo$col1
#'
#' ## We have duplicated values for the entries of `col1` related to rows of
#' ## rowData(q1) matched to multiple rows of the target data.frame t1. The
#' ## value of `col1` is returned for each row in the rowData of query.
#'
#' ## Extracting variable "col1" from target object. To access columns from
#' ## target we have to prefix the name of the column by `"target_"`.
#' ## Note that only values of `col1` for rows matching at least one row in
#' ## rowData of query are returned and an NA is reported for those without
#' ## matching target rows.
#' mo$target_col1
#'
#' ## The 3rd and 4th query rows do not match any target row, thus `NA` is
#' ## returned.
#'
#' ## `matchedData` can be used to extract all (or selected) columns
#' ## from the object. Same as with `$`, a left join between the columns
#' ## from the query and the target is performed. Below we extract selected
#' ## columns from the object as a DataFrame.
#' res <- matchedData(mo, columns = c("col1", "col2", "target_col1",
#'                                   "target_col2"))
#' res
#' res$col1
#' res$target_col1
#'
#' ## The example `Matched` object contains all rows in the
#' ## `rowData` of the `SummarizedExperiment` and all target rows. Below we
#' ## subset the object keeping only rows that are matched to at least one
#' ## target row.
#' mo_sub <- mo[whichQuery(mo)]
#'
#' ## mo_sub contains now a `SummarizedExperiment` with only 3 rows:
#' nrow(query(mo_sub))
#'
#' ## while the original object contains a `SummarizedExperiment` with all 5
#' ## rows:
#' nrow(query(mo))
#'
#' ## Both objects contain however still the full target object:
#' nrow(target(mo))
#' nrow(target(mo_sub))
#'
#' ## With the `pruneTarget` we can however reduce also the target rows to
#' ## only those that match at least one in the `rowData` of query
#' mo_sub <- pruneTarget(mo_sub)
#' nrow(target(mo_sub))
NULL

setClass(
    "Matched",
    slots = c(
        query = "ANY",
        target = "ANY",
        matches = "data.frame",
        queryAssay = "character",
        targetAssay = "character",
        metadata = "list",
        version = "character"
    ),
    prototype = prototype(
        query = list(),
        target = list(),
        matches = data.frame(query_idx = integer(),
                             target_idx = integer(),
                             score = numeric()),
        queryAssay = character(),
        targetAssay = character(),
        metadata = list(),
        version = "0.1")
)

#' @export
#'
#' @rdname Matched
Matched <- function(query = list(), target = list(),
                    matches = data.frame(query_idx = integer(),
                                         target_idx = integer(),
                                         score = numeric()),
                    queryAssay = character(), targetAssay = character(),
                    metadata = list()) {
    new("Matched", query = query, target = target, matches = matches,
        queryAssay = queryAssay, targetAssay = targetAssay, metadata = metadata)
}

setValidity("Matched", function(object) {
    msg <- .validate_matches_format(object@matches)
    if (length(msg)) return(msg)
    msg <- .validate_matches_content(object@matches, length(object),
                                     .nelements_t(object))
    if (any(c("query", "target") %in% colnames(object@matches)))
        return("\"query\" and \"target\" can't be used as matches column names")
    if (length(msg)) return(msg)
    msg <- .validate_qt(object@query)
    if (length(msg)) return(msg)
    msg <- .validate_qt(object@target)
    if (length(msg)) return(msg)
    msg <- .validate_assay(object@query, object@queryAssay)
    if (length(msg)) return(msg)
    msg <- .validate_assay(object@target, object@targetAssay, "target")
    if (length(msg)) return(msg)
    TRUE
})

#' @export
#'
#' @rdname Matched
setMethod("length", "Matched",
          function(x) .nelements(.objectToMatch(x@query, x@queryAssay)))

.nelements_t <- function(x) .nelements(.objectToMatch(x@target, x@targetAssay))

#' @exportMethod show
#'
#' @importMethodsFrom methods show
#'
#' @rdname Matched
setMethod("show", "Matched", function(object) {
    cat("Object of class", class(object)[1L], "\n")
    cat("Total number of matches:", nrow(object@matches), "\n")
    cat("Number of query objects: ", length(object),
        " (", length(unique(object@matches$query_idx)), " matched)\n", sep = "")
    cat("Number of target objects: ", .nelements_t(object), " (",
        length(unique(object@matches$target_idx)), " matched)\n", sep = "")
})

#' @exportMethod [
#'
#' @rdname Matched
setMethod("[", "Matched", function(x, i, j, ..., drop = FALSE) {
    if (missing(i))
        return(x)
    if (is.logical(i))
        i <- which(i)
    .subset_matches_nodim(x, i)
})

#' @rdname Matched
#'
#' @export
matches <- function(object) {
    object@matches
}

#' @rdname Matched
#'
#' @export
target <- function(object) {
    object@target
}

#' @rdname Matched
#'
#' @importMethodsFrom AnnotationHub query
#'
#' @export
setMethod("query", "Matched", function(x, ...) {
    x@query
})

#' @rdname Matched
#'
#' @export
targetIndex <- function(object) {
    if (!inherits(object, "Matched"))
        stop("'object' is expected to be a or inherit from an object of ",
             "type 'Matched'.")
    matches(object)$target_idx
}

#' @rdname Matched
#'
#' @export
queryIndex <- function(object) {
    if (!inherits(object, "Matched"))
        stop("'object' is expected to be a or inherit from an object of ",
             "type 'Matched'.")
    matches(object)$query_idx
}

#' @rdname Matched
#'
#' @export
whichTarget <- function(object) {
    unique(object@matches$target_idx)
}

#' @rdname Matched
#'
#' @export
whichQuery <- function(object) {
    unique(object@matches$query_idx)
}

#' @importMethodsFrom S4Vectors $
#'
#' @rdname Matched
#'
#' @export
setMethod("$", "Matched", function(x, name) {
  .dollar(.objectToMatch(x@query, x@queryAssay),
          .objectToMatch(x@target, x@targetAssay), x@matches, name)
})

#' @importFrom BiocGenerics colnames
#'
#' @exportMethod colnames
#'
#' @rdname Matched
setMethod("colnames", "Matched", function(x) {
  .colnames(.objectToMatch(x@query, x@queryAssay),
            .objectToMatch(x@target, x@targetAssay), x@matches)
})

#' @rdname Matched
#'
#' @export
scoreVariables <- function(object) {
    matchescols <- colnames(object@matches)
    matchescols[grep("score", matchescols, ignore.case = TRUE)]
}

#' @rdname Matched
setMethod("queryVariables", "Matched", function(object) {
    query <- .objectToMatch(object@query, object@queryAssay)
    cnq <- character()
    if (length(dim(query)) == 2)
        cnq <- colnames(query)
    if (is.null(dim(query)))
        cnq <- "query"
    cnq
})

#' @rdname Matched
setMethod("targetVariables", "Matched", function(object) {
    .cnt(.objectToMatch(object@target, object@targetAssay))
})

#' @importMethodsFrom S4Vectors cbind
#'
#' @importFrom S4Vectors DataFrame
#'
#' @rdname Matched
#'
#' @export
setMethod("matchedData", "Matched", function(object,
                                             columns = colnames(object), ...) {
    .matchedData(.objectToMatch(object@query, object@queryAssay),
                 .objectToMatch(object@target, object@targetAssay),
                 object@matches, columns, ... )
})


.nelements <- function(x) {
    ifelse(is.null(d <- dim(x)), length(x), d[1])
}

#' @importFrom methods is
#'
#' @noRd
.extract_elements <- function(x, i, j, drop = FALSE) {
    if (length(dim(x))) {
        if (missing(j)) j <- seq_len(dim(x)[2])
        res <- x[i, j, drop = drop]
    } else res <- x[i]
    if (is(x, "list") && any(na <- is.na(i)))
        res[na] <- NA
    res
}

#' Subsetting of a matched object with slots query, target and matches.
#'
#' @param i has to be an `integer` vector with the indices of the query elements
#'     to keep.
#'
#' @importFrom methods slot<-
#'
#' @noRd
.subset_matches_nodim <- function(x, i) {
    if (!all(i %in% seq_len(length(x))))
        stop("subscript contains out-of-bounds indices", call. = FALSE)
    slot(x, "query", check = FALSE) <- .subset_qt(x@query, x@queryAssay, i)
    mtches <- x@matches[x@matches$query_idx %in% i, , drop = FALSE]
    if (nrow(mtches)) {
        ## Support handling duplicated indices.
        mtches <- split.data.frame(
            mtches, f = as.factor(mtches$query_idx))[as.character(i)]
        lns <- vapply(mtches, function(z)
            if (length(z)) nrow(z) else 0L, integer(1))
        mtches <- do.call(rbind, mtches[lengths(mtches) > 0])
        rownames(mtches) <- NULL
        mtches$query_idx <- rep(seq_along(i), lns)
    }
    slot(x, "matches", check = FALSE) <- mtches
    x
}

.subset_qt <- function(x, assay, i, j, drop = FALSE) {
    if (is(x, "QFeatures")) {
        if (!assay %in% names(x))
            stop("Invalid assay name.", call. = FALSE)
        x[[assay]] <- .extract_elements(x[[assay]], i, j, drop)
    } else {
        x <- .extract_elements(x, i, j, drop)
    }
    x
}

.fill_index <- function(x, y) {
    sort(c(setdiff(x, y), y))
}

.validate_matches_format <- function(x) {
    msg <- NULL
    if (!is.data.frame(x))
        msg <- c(msg, "'matches' should be a 'data.frame'")
    else {
        if (!all(c("query_idx", "target_idx", "score") %in% colnames(x)))
            return(c(msg, paste0("Not all required column names \"query_idx\",",
                                 " \"target_idx\" and \"score\" found in",
                                 " 'matches'")))
        if (!is.integer(x$target_idx))
            msg <- c(msg,
                     "column \"target_idx\" is expected to be of type integer")
        if (!is.integer(x$query_idx))
            msg <- c(msg,
                     "column \"query_idx\" is expected to be of type integer")
    }
    msg
}

.validate_matches_content <- function(x, nquery, ntarget) {
    msg <- NULL
    if (!all(x$query_idx %in% seq_len(nquery)))
        msg <- c(msg, "indices in \"query_idx\" are out-of-bounds")
    if (!all(x$target_idx %in% seq_len(ntarget)))
        msg <- c(msg, "indices in \"target_idx\" are out-of-bounds")
    msg
}

.validate_qt <- function(x){
    msg <- NULL
    ndim <- length(dim(x))
    if (!(ndim %in% c(0,2)))
        msg <- c(msg,
                 "unsupported dimensions in either \"query\" or \"target\"")
    if (ndim == 2 && !inherits(x, "SummarizedExperiment") &&
        is.null(colnames(x)))
        msg <- c(msg, paste0("either \"query\" or \"target\" have 2",
                             " dimensions but no column names"))
    msg
}

.validate_assay <- function(x, assay, what = "query") {
    if (is(x, "QFeatures")) {
        if (!(len_ass <- length(assay)))
            return(paste0("`", what, "Assay` has to be provided when `", what,
                          "` is `QFeatures`."))
        if (len_ass != 1)
            return(paste0("`", what, "Assay` must be `character(1)`"))
        if (!assay %in% names(x))
            return(paste0("No assay in `", what, "` with name \"", assay, "\""))
    }
}

.cnt <- function(target) {
    ndim <- length(dim(target))
    if (ndim == 2) {
        cnt <- colnames(target)
        if (length(cnt)) cnt <- paste0("target_", cnt)
    } else if (ndim == 0) {
        cnt <- "target"
    } else {
        stop("unsupported dimensions in \"target\"", call. = FALSE)
    }
    cnt
}

.colnames <- function(query, target, matches) {
    cns <- colnames(matches)
    cnq <- NULL
    cnt <- .cnt(target)
    if (length(dim(query)) == 2)
        cnq <- colnames(query)
    if (is.null(dim(query)))
        cnq <- "query"
    c(cnq, cnt, cns[!cns %in% c("query_idx", "target_idx")])
}

.dollar <- function(query, target, matches, name) {
    if (name %in% .colnames(query, target, matches))
    {
        not_mtchd <- setdiff(seq_len(.nelements(query)), matches$query_idx)
        idxs_qry <- c(matches$query_idx, not_mtchd)
        ord <- order(idxs_qry)
        if (name %in% colnames(matches)) {
            idxs_mtch <- c(seq_len(nrow(matches)),
                           rep(NA, length(not_mtchd)))[ord]
            return(matches[idxs_mtch, name])
        }
        if (name %in% .cnt(target)) {
            idxs_trg <- c(matches$target_idx, rep(NA, length(not_mtchd)))[ord]
            .extract_elements(target, idxs_trg, sub("target_", "", name),
                              drop = TRUE)
        }else
            .extract_elements(query, idxs_qry[ord], name, drop = TRUE)
    } else stop("'", name, "' not available", call. = FALSE)
}

.matchedData <- function(query, target, matches, columns, ...) {
    cnms <- .colnames(query, target, matches)
    if (any(!columns %in% cnms))
        stop("column(s) ", paste0(columns[!columns %in% cnms],
                                  collapse = ", "), " not available",
             call. = FALSE)
    not_mtchd <- setdiff(seq_len(.nelements(query)), matches$query_idx)
    idxs_qry <- c(matches$query_idx, not_mtchd)
    ord <- order(idxs_qry)
    from_target <- columns %in% .cnt(target)
    from_matches <- columns %in% colnames(matches)
    from_query <- !(from_target | from_matches)
    res_q <- NULL
    res_t <- NULL
    res_m <- NULL
    if (any(from_query))
        res_q <- .extract_elements(query, idxs_qry[ord], columns[from_query])
    if (any(from_target)) {
        idxs_trg <- c(matches$target_idx, rep(NA, length(not_mtchd)))[ord]
        res_t <- .extract_elements(target, idxs_trg,
                                   sub("target_", "", columns[from_target]))
    }
    if (any(from_matches)) {
        idxs_mtch <- c(seq_len(nrow(matches)), rep(NA, length(not_mtchd)))[ord]
        res_m <- matches[idxs_mtch, columns[from_matches], drop = FALSE]
    }
    if (!is.null(res_q) && is.null(dim(query))) res_q <- I(res_q)
    if (!is.null(res_t) && is.null(dim(target))) res_t <- I(res_t)
    any_qtm <- c(any(from_query), any(from_target), any(from_matches))
    res <- DataFrame(do.call(cbind, list(res_q, res_t, res_m)[any_qtm]))
    colnames(res) <- c(columns[from_query], columns[from_target],
                       columns[from_matches])
    res[, columns, drop = FALSE]
}

#' @importMethodsFrom SummarizedExperiment rowData
#'
#' @noRd
.objectToMatch <- function(x, assayname = character(), colnames = character()) {
    what <- as.character(sys.call()[-1])[1]
    if (is(x, "QFeatures")) {
        msg <- .validate_assay(x, assayname, what)
        if (!is.null(msg)) stop(msg, call. = FALSE)
        else x <- x[[assayname]]
    }
    if (is(x, "SummarizedExperiment"))
        x <- rowData(x)
    if (is(x, "Spectra")) {
        if (any(tmp <- !colnames %in% spectraVariables(x)))
            stop(paste0("Missing spectra variables \"", colnames[tmp],
                        "\" in ", what, collapse = "\n"), call. = FALSE)
        if (length(colnames))
            x <- spectraData(x, colnames)
        else
            x <- spectraData(x)
    }
    if (length(colnames)) {
        if (any(tmp <- !colnames %in% colnames(x)))
            stop(paste0("Missing column \"", colnames[tmp], "\" in ",
                        what, collapse = "\n"), call. = FALSE)
        x <- x[, colnames]
    }
    x
}

#' @rdname Matched
#'
#' @importFrom methods validObject
#'
#' @export
pruneTarget <- function(object) {
    keep <- whichTarget(object)
    object@target <- .subset_qt(object@target, object@targetAssay, keep)
    object@matches$target_idx <- match(object@matches$target_idx, keep)
    validObject(object)
    object
}

.findMatchesIdxs <- function(query, target, matches, queryValue = integer(),
                             targetValue = integer(),
                             queryColname = character(),
                             targetColname = character()) {
    if (length(queryValue) != length(targetValue))
        stop("'queryValue' and 'targetValue' must have the same length",
             call. = FALSE)
    if (length(dim(query)) == 2) {
        if (length(queryColname) == 0)
            stop("\"", queryColname,
                 "\" must be set when 'query' is 2-dimensional", call. = FALSE)
        if (!queryColname %in% colnames(query))
            stop("\"", queryColname, "\" is not a column of 'query'",
                 call. = FALSE)
    }
    targetColname <- sub("target_", "", targetColname)
    if (length(dim(target)) == 2) {
        if (length(targetColname) == 0)
            stop("\"", targetColname,
                 "\" must be set when 'target' is 2-dimensional", call. = FALSE)
        if (!targetColname %in% colnames(target))
            stop("\"", targetColname, "\" is not a column of 'target'",
                 call. = FALSE)
    }
    mq <- .extract_elements(query, matches$query_idx, queryColname,
                            drop = TRUE)
    mt <- .extract_elements(target, matches$target_idx, targetColname,
                            drop = TRUE)
    unlist(sapply(seq_along(queryValue), function(i)
        which(mq == queryValue[i] & mt == targetValue[i])))
}

#' @rdname Matched
#'
#' @importFrom methods validObject
#'
#' @export
setMethod("filterMatches",
          c("Matched", "missing"), function (object, queryValue = integer(),
                                             targetValue = integer(),
                                             queryColname = character(),
                                             targetColname = character(),
                                             index = integer(),
                                             keep = TRUE, ...) {
              param <- SelectMatchesParam(queryValue = queryValue,
                                          targetValue = targetValue,
                                          queryColname = queryColname,
                                          targetColname = targetColname,
                                          index = index, keep = keep)
              filterMatches(object, param, ...)
})

#' @noRd
setClass("SelectMatchesParam",
         slots = c(
             queryValue = "characterOrNumeric",
             targetValue = "characterOrNumeric",
             queryColname = "character",
             targetColname = "character",
             index = "integer",
             keep = "logical"),
         contains = "Param",
         prototype = prototype(
             queryValue = numeric(),
             targetValue = numeric(),
             queryColname = character(),
             targetColname = character(),
             index = integer(),
             keep = TRUE),
         validity = function(object) {
             msg <- NULL
             if(length(object@queryValue) != length(object@targetValue))
                 msg <- c(msg, paste0("'queryValue' and 'targetValue' ",
                                      "must have the same length"))
             if (length(object@queryColname) > 1)
                 msg <- c(msg,
                          "'queryColname' cannot be of length greater than 1")
             if (length(object@targetColname) > 1)
                 msg <- c(msg,
                          "'targetColname' cannot be of length greater than 1")
             if (any(object@index <= 0))
                 msg <- c(msg, "'index' must contain positive integers")
             if (length(object@keep) != 1)
                 msg <- c(msg, "'keep' must be a logical of length 1")
             msg
         })

#' @rdname Matched
#'
#' @importFrom methods new
#'
#' @export
SelectMatchesParam <-
    function(queryValue = numeric(), targetValue = numeric(),
             queryColname = character(), targetColname = character(),
             index = integer(), keep = TRUE) {
        new("SelectMatchesParam",
            queryValue = queryValue, targetValue = targetValue,
            queryColname = queryColname, targetColname = targetColname,
            index = as.integer(index), keep = keep)
    }

#' @noRd
setClass("TopRankedMatchesParam",
         slots = c(
             n = "integer",
             decreasing = "logical"),
         contains = "Param",
         prototype = prototype(
             n = 1L,
             decreasing = FALSE),
         validity = function(object) {
             msg <- NULL
             if (length(object@n) != 1 || object@n <= 0)
                 msg <- "'n' must be a length 1 positive integer"
             msg
         })

#' @rdname Matched
#'
#' @importFrom methods new
#'
#' @export
TopRankedMatchesParam <- function(n = 1L, decreasing = FALSE) {
    new("TopRankedMatchesParam", n = as.integer(n), decreasing = decreasing[1L])
}

#' @noRd
setClass("ScoreThresholdParam",
         slots = c(
             threshold = "numeric",
             above = "logical",
             column = "character"),
         contains = "Param",
         prototype = prototype(
             threshold = 0,
             above = FALSE,
             column = "score")
         )

#' @rdname Matched
#'
#' @importFrom methods new
#'
#' @export
ScoreThresholdParam <- function(threshold = 0, above = FALSE,
                                column = "score") {
    new("ScoreThresholdParam", threshold = threshold[1L], above = above[1L],
        column = column[1L])
}

#' @rdname Matched
#'
#' @importFrom methods validObject
#'
#' @export
setMethod("filterMatches", c("Matched", "SelectMatchesParam"), function (object,
                                                                         param,
                                                                         ...) {
    index <- param@index
    if (length(index) && any(!index %in% seq_len(nrow(object@matches))))
        stop("some indices in \"index\" are out-of-bounds", call. = FALSE)
    if (!length(index) && length(param@queryValue))
        index  <- .findMatchesIdxs(.objectToMatch(object@query,
                                                  object@queryAssay),
                                   .objectToMatch(object@target,
                                                  object@targetAssay),
                                   object@matches, param@queryValue,
                                   param@targetValue, param@queryColname,
                                   param@targetColname)
    if (param@keep) to_keep <- seq_len(nrow(object@matches)) %in% index
    else to_keep <- !seq_len(nrow(object@matches)) %in% index
    object@matches <- object@matches[to_keep, , drop = FALSE]
    object@metadata <- c(object@metadata, param = param)
    validObject(object)
    object
})

#' @rdname Matched
#'
#' @importFrom methods validObject
#'
#' @export
setMethod("filterMatches", c("Matched", "TopRankedMatchesParam"),
          function (object, param, ...) {
              sign <- ifelse(param@decreasing, yes = -1, no = 1)
              rank <- rank(abs(object@matches$score) * sign)
              if ("score_rt" %in% colnames(object@matches))
                  rank <- rank * rank(abs(object@matches$score_rt) * sign)
              seq_len_nm <- seq_len(nrow(object@matches))
              tmp <- split.data.frame(cbind(seq_len_nm, rank),
                                      object@matches$query_idx)
              index <- do.call("c", lapply(tmp, function(x)
                  x[order(x[, 2])[seq_len(min(param@n, nrow(x)))], 1]))
              to_keep <- seq_len_nm %in% index
              object@matches <- object@matches[to_keep, , drop = FALSE]
              object@metadata <- c(object@metadata, param = param)
              validObject(object)
              object
          })

#' @rdname Matched
#'
#' @importFrom methods validObject
#'
#' @export
setMethod("filterMatches", c("Matched", "ScoreThresholdParam"),
          function (object, param, ...) {
              if (!param@column %in% colnames(object@matches))
                  stop("\"", param@column,
                       "\" variable not present in `object`")
              if (param@above)
                  to_keep <- object@matches[, param@column] > param@threshold
              else to_keep <- object@matches[, param@column] < param@threshold
              object@matches <- object@matches[to_keep, , drop = FALSE]
              object@metadata <- c(object@metadata, param = param)
              validObject(object)
              object
          })

#' @noRd
setClass("SingleMatchParam",
         slots = c(
             duplicates = "character",
             column = "character",
             decreasing = "logical"),
         contains = "Param",
         prototype = prototype(
             duplicates = "remove",
             column = "score",
             decreasing = TRUE)
         )

#' @rdname Matched
#'
#' @export
SingleMatchParam <- function(duplicates = c("remove", "closest", "top_ranked"),
                             column = "score", decreasing = TRUE) {
    duplicates <- force(match.arg(duplicates))
    new("SingleMatchParam", duplicates = duplicates, column = column[1L],
        decreasing = decreasing[1L])
}

#' @rdname Matched
#'
#' @export
setMethod(
    "filterMatches", c("Matched", "SingleMatchParam"),
    function (object, param, ...) {
        if (!param@column %in% c(scoreVariables(object),
                                 targetVariables(object)))
            stop("Variable \"", param@column, "\" not found. `column` ",
                 "should be one of 'scoreVariables(object)' or ",
                 "'targetVariables(object)'.")
        object@metadata <- c(object@metadata, param = param)
        if (!nrow(object@matches))
            return(object)
        switch(
            param@duplicates[1L],
            "remove" = {
                s <- split(seq_len(nrow(object@matches)),
                           object@matches$query_idx)
                keep <- unlist(s[lengths(s) == 1L], use.names = FALSE)
                object@matches <- object@matches[keep, , drop = FALSE]
            },
            "closest" = {
                object <- filterMatches(
                    object, TopRankedMatchesParam(n = 1L, decreasing = FALSE))
            },
            "top_ranked" = {
                ## Rank matches by "column"
                if (param@column %in% scoreVariables(object))
                    vals <- cbind(seq_len(nrow(object@matches)),
                                  object@matches$query_idx,
                                  object@matches[, param@column])
                else
                    vals <- cbind(
                        seq_len(nrow(object@matches)),
                        object@matches$query_idx,
                        .extract_elements(
                            .objectToMatch(object@target, object@targetAssay),
                            object@matches$target_idx,
                            sub("target_", "", param@column)))
                    vals <- vals[order(vals[, 3L],
                                       decreasing = param@decreasing), ,
                                 drop = FALSE]
                    keep <- vals[match(unique(object@matches$query_idx),
                                       vals[, 2L]), 1L]
                    object@matches <- object@matches[keep, , drop = FALSE]
            },
            stop("'duplicates' has to be one of \"remove\", \"closest\"",
                 " or \"top_ranked\"."))
        validObject(object)
        object
    })

#' @importFrom MsCoreUtils rbindFill
.addMatches <- function(query, target, matches, queryValue = integer(),
                        targetValue = integer(), queryColname = character(),
                        targetColname = character(),
                        score = rep(NA_real_, length(queryValue)),
                        isIndex = FALSE) {
    if (!is.data.frame(score))
        score <- data.frame(score = score)
    if (!is.data.frame(score))
        stop("'score' needs to be either a 'data.frame' or a numeric",
             call. = FALSE)
    if (length(queryValue) != length(targetValue) ||
        length(queryValue) != nrow(score))
        stop("'queryValue', 'targetValue' and 'score' must have the",
             " same length", call. = FALSE)
    if (isIndex) {
        if (!is.integer(queryValue) || !is.integer(targetValue))
            stop("'queryValue' and 'targetValue' must be integer ",
                 "vectors when 'isIndex = TRUE'", call. = FALSE)
        if (any(!queryValue %in% seq_len(.nelements(query))))
            stop("Provided indices in 'queryValue' are out-of-bounds.",
                 call. = FALSE)
        if (any(!targetValue %in% seq_len(.nelements(target))))
            stop("Provided indices in 'queryValue' are out-of-bounds.",
                 call. = FALSE)
        query_idx <- queryValue
        target_idx <- targetValue
    } else {
        if (length(dim(query)) == 2)
            if (!queryColname %in% colnames(query))
                stop("\"", queryColname,"\" is not a column of 'query'",
                     call. = FALSE)
        if (length(dim(target)) == 2)
            if (!targetColname %in% colnames(target))
                stop("\"", targetColname,"\" is not a column of 'target'",
                     call. = FALSE)
        mq <- match(queryValue, .extract_elements(query, j = queryColname,
                                                  drop = TRUE))
        mt <- match(targetValue, .extract_elements(target, j = targetColname,
                                                   drop = TRUE))
        to_keep <- !is.na(mq) & !is.na(mt)
        query_idx <- mq[to_keep]
        target_idx <- mt[to_keep]
        score <- score[to_keep, ]
    }
    new_matches <- cbind(
        data.frame(query_idx = query_idx, target_idx = target_idx),
        score)
    new_matches <- rbindFill(matches, new_matches)
    ## remove possible matches that were already in matches
    new_matches[!duplicated(new_matches[, c("query_idx", "target_idx")]), ]
}

#' @rdname Matched
#'
#' @importFrom methods validObject
#'
#' @export
setMethod("addMatches", "Matched",
          function(object, queryValue = integer(), targetValue = integer(),
                   queryColname = character(), targetColname = character(),
                   score = rep(NA_real_, length(queryValue)), isIndex = FALSE) {
              object@matches <- .addMatches(.objectToMatch(object@query,
                                                           object@queryAssay),
                                            .objectToMatch(object@target,
                                                           object@targetAssay),
                                            object@matches, queryValue,
                                            targetValue, queryColname,
                                            targetColname, score, isIndex)
              validObject(object)
              object
          })

#' @rdname Matched
#'
#' @importFrom methods validObject
#'
#' @importFrom S4Vectors endoapply
#'
#' @export
setMethod("endoapply", "ANY", function(X, FUN, ...) {
    endoapply(X, FUN, ...)
})

#' @rdname Matched
#'
#' @importFrom methods validObject
#'
#' @export
setMethod("endoapply", "Matched", function(X, FUN, ...) {
    tmp <- lapply(seq_along(X), function(i) FUN(X[i], ...)@matches)
    matches <- do.call(rbind, tmp)
    matches$query_idx <- rep(seq_along(tmp), vapply(tmp, nrow, integer(1)))
    X@matches <- matches
    validObject(X)
    X
})

#' @rdname Matched
#'
#' @importFrom BiocGenerics lapply
#'
#' @export
setMethod("lapply", "Matched", function(X, FUN, ...) {
    lapply(seq_along(X), function(i) FUN(X[i], ...))
})
rformassspectrometry/MetaboAnnotation documentation built on March 12, 2024, 4:09 a.m.