merge_replicate_samples: Merge technical replicates prior to downstream analysis
In ftwkoopmans/msdap: Mass Spectrometry Downstream Analysis Pipeline

View source: R/merge_replicate_samples.R

merge_replicate_samples

R Documentation

Merge technical replicates prior to downstream analysis

Description

Replicate measurements of the same "biological sample ID" are merged by this function, updating the peptide and sample tables; foreach set of rawfiles (sample_id in the sample metadata table) from the same biological sample, all but 1 sample_id will be removed and the remaining sample_id will be assigned the mean peptide values (intensity, retention time, etc.) across all replicates.

Peptide log2 intensity values can be rescaled prior to merging values across samples to prevent potential bias towards samples with higher sample loading (recommended, default setting).

Importantly, the sample metadata table must contain a column with unique identifiers for all samples that must be merged (so NOT replicate numbers!). Keep a close eye on the output log (also found in report.pdf) to confirm the right samples have been merged !

Below example shows a sample metadata table with 2 technical replicates for mouse1, 2 technical replicates for mouse2, and no technical replicates for the mice in group B (bioid column is left empty).

sample_id group bioid
rawfile1 grpA mouse1
rawfile2 grpA mouse1
rawfile3 grpA mouse2
rawfile4 grpA mouse2
rawfile5 grpB
rawfile6 grpB

Usage

merge_replicate_samples(
  dataset,
  colname,
  minsample = 1L,
  rescale_intensities = TRUE
)

Arguments

`dataset`	dataset with sample metadata attached
`colname`	column name in the samples table that represents identifiers for samples that belong together / should be merged; give the same sample-unique label/ID to each row in the sample metadata that should be merged. NOT A COLUMN WITH TECHNICAL REPLICATES IDS !
`minsample`	minimum number of 'technical replicates' samples where a peptide must be observed. e.g. if set to 2 and there are 3 technical replicates, this will retain all peptides that have an intensity value in at least 2 out of 3 replicates. If you provide a value larger than the number of biological replicates, this limit will be capped at the number of available replicates (e.g. min_sample=3 for biosamples with 2 replicates -> retain peptides available in 2/2 replicates). By default, set to 1 (retaining all peptides found in any of the technical replicates)
`rescale_intensities`	boolean value indicating whether by-sample median normalization should be applied prior to averaging log2 peptide intensities across samples. Strongly recommended if you didn't import data that was already normalized at peptide-level. Doesn't hurt much if normalized data is used as input, so this is enabled by default

Examples

## Not run: 
  # first, import your dataset and sample metadata
  dataset = import_dataset_diann(...)
  dataset = import_sample_metadata(...)

  # merge technical replicates
  dataset = merge_replicate_samples(
    dataset,
    colname = "bioid",
    minsample = 2,
    rescale_intensities = TRUE
  )

  # proceed with typical MS-DAP usage
  dataset = analysis_quickstart(...)

## End(Not run)

ftwkoopmans/msdap documentation built on March 5, 2025, 12:15 a.m.