calcScores: Compute Feature Similarity Scores

View source: R/calcScores.R

calcScoresR Documentation

Compute Feature Similarity Scores

Description

Calculates a pairwise similarity (between 0 & 1) between all grouped features in metabCombiner object. The similarity score calculation is described in scorePairs.

Usage

calcScores(
  object,
  A = 75,
  B = 10,
  C = 0.25,
  groups = NULL,
  fit = c("gam", "loess"),
  mzshift = FALSE,
  mzfit = mzfitParam(),
  useAdduct = FALSE,
  adduct = 1.25,
  usePPM = FALSE,
  brackets_ignore = c("(", "[", "{")
)

Arguments

object

metabCombiner object.

A

Numeric weight for penalizing m/z differences.

B

Numeric weight for penalizing differences between fitted & observed retention times

C

Numeric weight for differences in Q (abundance quantiles).

groups

integer. Vector of feature groups to score. If set to NULL (default), will compute scores for all feature groups.

fit

Character. Choice of fitted rt model, "gam" or "loess."

mzshift

Logical. If TRUE, shifts the m/z values (mzx) before scoring.

mzfit

List of parameters for shifting m/z values; see ?mzfitParam

useAdduct

logical. Option to penalize mismatches in (non-empty, non-bracketed) adduct column annotations.

adduct

numeric. If useAdduct is TRUE, divides score of mismatched, non-empty and non-bracked adduct column labels by this value.

usePPM

logical. Option to use relative (as opposed to absolute) m/z differences in score computations.

brackets_ignore

If useAdduct = TRUE, bracketed adduct character strings of these types will be ignored according to this argument

Details

This function updates the rtProj, score, rankX, and rankY columns in the combinedTable report. First, using the RT mapping model computed in the previous steps, rtx values are projected onto rty. Then similarity scores are calculated based on m/z, rt (rtProj vs rty), and Q differences, with multiplicative weight penalties A, B, and C.

If the datasets contain representative set of shared identities (idx = idy), evaluateParams provides some guidance on appropriate A, B, and C values to use. In testing, the best values for A should lie between 50 and 120, according to mass accuracy; if using ppm (usePPM = TRUE), the suggested range is between 0.01 and 0.05. B should be between 5 and 15 depending on fitting accuracy (higher if datasets processed under roughly identical conditions) ; C should vary between 0 and 1, depending on sample similarity. See examples below.

Some input datasets exhibit systematic m/z shifts

If using adduct information (useAdduct = TRUE), the score is divided by the numeric adduct argument if non-empty and non-bracketed adduct values do not match. Be sure that adduct annotations are accurate before using this functionality.

Value

metabCombiner object with updated combinedTable. rtProj column will contain fitted retention times determined from previously computed model; score will contain computed pairwise similarity scores of feature pairs; rankX & rankY are the integer ranks of scores for x & y features in descending order.

See Also

evaluateParams, scorePairs

Examples


data(plasma30)
data(plasma20)

p30 <- metabData(plasma30, samples = "CHEAR")
p20 <- metabData(plasma20, samples = "Red", rtmax = 17.25)
p.comb <- metabCombiner(xdata = p30, ydata = p20, binGap = 0.0075)

p.comb <- selectAnchors(p.comb, tolmz = 0.003, tolQ = 0.3, windy = 0.02)
p.comb <- fit_gam(p.comb, k = 20, iterFilter = 1, family = "gaussian")

#example: moderate m/z deviation, accurate rt fit, high sample similarity
p.comb <- calcScores(p.comb, A = 90, B = 14, C = 0.8, useAdduct = FALSE,
         groups = NULL, fit = "gam", usePPM = FALSE)
cTable <- combinedTable(p.comb)  #to view results


#example 2: high m/z deviation, moderate rt fit, low sample similarity
p.comb <- calcScores(p.comb, A = 50, B = 8, C = 0.2)

#example 3: low m/z deviation, poor rt fit, moderate sample similarity
p.comb <- calcScores(p.comb, A = 120, B = 5, C = 0.5)

#example 4: using ppm for mass deviation; note different A value
p.comb <- calcScores(p.comb, A = 0.05, B = 14, C = 0.5, usePPM = TRUE)

#example 5: limiting to specific m/z groups 1-1000
p.comb <- calcScores(p.comb, A = 90, B = 14, C = 0.5, groups = seq(1,1000))

#example 6: using adduct information
p.comb <- calcScores(p.comb, A = 90, B = 14, C = 0.5, useAdduct = TRUE,
                     adduct = 1.25)


hhabra/Combiner documentation built on June 5, 2024, 10:56 a.m.