readTopDownFiles: Read top-down files.

View source: R/functions-TopDownSet.R

readTopDownFilesR Documentation

Read top-down files.

Description

It creates an TopDownSet object and is its only constructor.

Usage

readTopDownFiles(
  path,
  pattern = ".*",
  type = c("a", "b", "c", "x", "y", "z"),
  modifications = c("Carbamidomethyl", "Acetyl", "Met-loss"),
  customModifications = data.frame(),
  adducts = data.frame(),
  neutralLoss = PSMatch::defaultNeutralLoss(),
  sequenceOrder = c("original", "random", "inverse"),
  tolerance = 5e-06,
  redundantIonMatch = c("remove", "closest"),
  redundantFragmentMatch = c("remove", "closest"),
  dropNonInformativeColumns = TRUE,
  sampleColumns = c("Mz", "AgcTarget", "EtdReagentTarget", "EtdActivation",
    "CidActivation", "HcdActivation", "UvpdActivation"),
  conditions = "ScanDescription",
  verbose = interactive()
)

Arguments

path

character, path to directory that contains the top-down files.

pattern

character, a filename pattern, the default ⁠.*⁠ means all files.

type

character, type of fragments, currently a-c and x-z are supported, see PSMatch::calculateFragments() for details.

modifications

character, unimod names of modifications that should be applied. Currenlty just Acetyl (Unimod:1 but just protein N-term), Carbamidomethyl (Unimod:4) and Met-loss (Unimod:765) are supported. Met-loss removes M (if followed by A, C, G, P, S, T, or V; (see also http://www.unimod.org/modifications_view.php?editid1=1, http://www.unimod.org/modifications_view.php?editid1=4, and http://www.unimod.org/modifications_view.php?editid1=765 for details)). Use NULL to disable all modifications.

customModifications

data.frame, with 4 columns, namely: mass, name, location, variable, see details section.

adducts

data.frame, with 3 columns, namely: mass, name, to, see details section.

neutralLoss

list, neutral loss that should be applied, see PSMatch::calculateFragments() and PSMatch::defaultNeutralLoss() for details.

sequenceOrder

character, order of the sequence before fragment calculation and matching is done. "original" doesn't change anything. "inverse" reverse the sequence and "random" arranges the amino acid sequence at random.

tolerance

double, tolerance in ppm that is used to match the theoretical fragments with the observed ones.

redundantIonMatch

character, a mz could be matched to one, two or more fragments. If it is matched against more than one fragment the match could be "remove"d or the match to the "closest" fragment could be chosen.

redundantFragmentMatch

character, one or more mz could be matched to the same fragment, these matches could be "remove"d or the match to the "closest" mz is chosen.

dropNonInformativeColumns

logical, should columns with just one identical value across all runs be removed?

sampleColumns

character, column names of the colData() used to define a sample (technical replicate). This is used to add the Sample column (used for easier aggregation, etc.).

conditions

character/numeric, one of:

  • "ScanDescription" (default): create condition IDs based on the given "Scan Description" parameter (set automatically by createExperimentsFragmentOptimisation()).

  • "FilterString": create condition IDs based on mass labels in the FilterString column (included for backward-compatibilty, used in writeMethodXmls() prior version 1.5.2 in Dec 2018).

  • A single numeric value giving the number of conditions.

verbose

logical, verbose output?

Details

readTopDownFiles reads and processes all top-down files, namely:

  • .fasta (protein sequence)

  • .mzML (spectra)

  • .experiments.csv (method/fragmentation conditions)

  • .txt (scan header information)

customModifications: additional to the provided unimod modifications available through the modifications argument customModifications allow to apply user-definied modifications to the output of PSMatch::calculateFragments(). The customModifications argument takes a data.frame with the mass to add, the name of the modification, the location (could be the position of the amino acid or "N-term"/"C-term"), whether the modification is always seen (variable=FALSE) or both, the modified and unmodified amino acid are present (variable=TRUE), e.g. for Activation (which is available via modification="Acetyl") data.frame(mass=42.010565, name="Acetyl", location="N-term", variable=FALSE) or variable one (that could be present or not): data.frame(mass=365.132, name="Custom", location=10, variable=TRUE)

If the customModifications data.frame contains multiple columns the modifications are applied from row one to the last row one each time.

adducts: Thermo's Xtract allows some mistakes in deisotoping, mostly it allows ⁠+/- C13-C12⁠ and ⁠+/- H+⁠. The adducts argument takes a data.frame with the mass to add, the name that should assign to these new fragments and an information to whom the modification should be applied, e.g. for ⁠H+⁠ on z, data.frame(mass=1.008, name="zpH", to="z").

Please note: The adducts are added to the output of PSMatch::calculateFragments(). That has some limitations, e.g. neutral loss calculation could not be done in topdownr-package. If neutral loss should be applied on adducts you have to create additional rows, e.g.: data.frame(mass=c(1.008, 1.008), name=c("cpH", "cpH_"), to=c("c", "c_")).

Value

A TopDownSet object.

See Also

PSMatch::calculateFragments(), PSMatch::defaultNeutralLoss()

Examples

if (require("topdownrdata")) {
    # add H+ to z and no neutral loss of water
    tds <- readTopDownFiles(
        topdownrdata::topDownDataPath("myoglobin"),
        ## Use an artifical pattern to load just the fasta
        ## file and files from m/z == 1211, ETD reagent
        ## target 1e6 and first replicate to keep runtime
        ## of the example short
        pattern=".*fasta.gz$|1211_.*1e6_1",
        adducts=data.frame(mass=1.008, name="zpH", to="z"),
        neutralLoss=PSMatch::defaultNeutralLoss(
            disableWaterLoss=c("Cterm", "D", "E", "S", "T")),
        tolerance=25e-6
   )
}

sgibb/topdownr documentation built on Jan. 16, 2024, 12:14 a.m.