readTopDownFiles: Read top-down files.
In sgibb/topdownr: Investigation of Fragmentation Conditions in Top-Down Proteomics

readTopDownFiles

R Documentation

Read top-down files.

Description

It creates an TopDownSet object and is its only constructor.

Usage

readTopDownFiles(
  path,
  pattern = ".*",
  type = c("a", "b", "c", "x", "y", "z"),
  modifications = c("Carbamidomethyl", "Acetyl", "Met-loss"),
  customModifications = data.frame(),
  adducts = data.frame(),
  neutralLoss = PSMatch::defaultNeutralLoss(),
  sequenceOrder = c("original", "random", "inverse"),
  tolerance = 5e-06,
  redundantIonMatch = c("remove", "closest"),
  redundantFragmentMatch = c("remove", "closest"),
  dropNonInformativeColumns = TRUE,
  sampleColumns = c("Mz", "AgcTarget", "EtdReagentTarget", "EtdActivation",
    "CidActivation", "HcdActivation", "UvpdActivation"),
  conditions = "ScanDescription",
  verbose = interactive()
)

Arguments

`path`	`character`, path to directory that contains the top-down files.
`pattern`	`character`, a filename pattern, the default `⁠.*⁠` means all files.
`type`	`character`, type of fragments, currently a-c and x-z are supported, see `PSMatch::calculateFragments()` for details.
`modifications`	`character`, unimod names of modifications that should be applied. Currenlty just Acetyl (Unimod:1 but just protein N-term), Carbamidomethyl (Unimod:4) and Met-loss (Unimod:765) are supported. Met-loss removes M (if followed by A, C, G, P, S, T, or V; (see also http://www.unimod.org/modifications_view.php?editid1=1, http://www.unimod.org/modifications_view.php?editid1=4, and http://www.unimod.org/modifications_view.php?editid1=765 for details)). Use `NULL` to disable all modifications.
`customModifications`	`data.frame`, with 4 columns, namely: mass, name, location, variable, see details section.
`adducts`	`data.frame`, with 3 columns, namely: mass, name, to, see details section.
`neutralLoss`	`list`, neutral loss that should be applied, see `PSMatch::calculateFragments()` and `PSMatch::defaultNeutralLoss()` for details.
`sequenceOrder`	`character`, order of the sequence before fragment calculation and matching is done. `"original"` doesn't change anything. `"inverse"` reverse the sequence and `"random"` arranges the amino acid sequence at random.
`tolerance`	`double`, tolerance in ppm that is used to match the theoretical fragments with the observed ones.
`redundantIonMatch`	`character`, a mz could be matched to one, two or more fragments. If it is matched against more than one fragment the match could be `"remove"`d or the match to the `"closest"` fragment could be chosen.
`redundantFragmentMatch`	`character`, one or more mz could be matched to the same fragment, these matches could be `"remove"`d or the match to the `"closest"` mz is chosen.
`dropNonInformativeColumns`	logical, should columns with just one identical value across all runs be removed?
`sampleColumns`	`character`, column names of the `colData()` used to define a sample (technical replicate). This is used to add the `Sample` column (used for easier aggregation, etc.).
`conditions`	`character`/`numeric`, one of: `"ScanDescription"` (default): create condition IDs based on the given "Scan Description" parameter (set automatically by `createExperimentsFragmentOptimisation()`). `"FilterString"`: create condition IDs based on mass labels in the FilterString column (included for backward-compatibilty, used in `writeMethodXmls()` prior version 1.5.2 in Dec 2018). A single `numeric` value giving the number of conditions.
`verbose`	`logical`, verbose output?

Details

readTopDownFiles reads and processes all top-down files, namely:

.fasta (protein sequence)
.mzML (spectra)
.experiments.csv (method/fragmentation conditions)
.txt (scan header information)

customModifications: additional to the provided unimod modifications available through the modifications argument customModifications allow to apply user-definied modifications to the output of PSMatch::calculateFragments(). The customModifications argument takes a data.frame with the mass to add, the name of the modification, the location (could be the position of the amino acid or "N-term"/"C-term"), whether the modification is always seen (variable=FALSE) or both, the modified and unmodified amino acid are present (variable=TRUE), e.g. for Activation (which is available via modification="Acetyl") data.frame(mass=42.010565, name="Acetyl", location="N-term", variable=FALSE) or variable one (that could be present or not): data.frame(mass=365.132, name="Custom", location=10, variable=TRUE)

If the customModifications data.frame contains multiple columns the modifications are applied from row one to the last row one each time.

adducts: Thermo's Xtract allows some mistakes in deisotoping, mostly it allows ⁠+/- C13-C12⁠ and ⁠+/- H+⁠. The adducts argument takes a data.frame with the mass to add, the name that should assign to these new fragments and an information to whom the modification should be applied, e.g. for ⁠H+⁠ on z, data.frame(mass=1.008, name="zpH", to="z").

Please note: The adducts are added to the output of PSMatch::calculateFragments(). That has some limitations, e.g. neutral loss calculation could not be done in topdownr-package. If neutral loss should be applied on adducts you have to create additional rows, e.g.: data.frame(mass=c(1.008, 1.008), name=c("cpH", "cpH_"), to=c("c", "c_")).

Value

A TopDownSet object.

Examples

if (require("topdownrdata")) {
    # add H+ to z and no neutral loss of water
    tds <- readTopDownFiles(
        topdownrdata::topDownDataPath("myoglobin"),
        ## Use an artifical pattern to load just the fasta
        ## file and files from m/z == 1211, ETD reagent
        ## target 1e6 and first replicate to keep runtime
        ## of the example short
        pattern=".*fasta.gz$|1211_.*1e6_1",
        adducts=data.frame(mass=1.008, name="zpH", to="z"),
        neutralLoss=PSMatch::defaultNeutralLoss(
            disableWaterLoss=c("Cterm", "D", "E", "S", "T")),
        tolerance=25e-6
   )
}

sgibb/topdownr documentation built on June 15, 2025, 4:10 a.m.