topdownrdata-package: Example Data for the topdownr package.

Description Details Author(s) References See Also Examples

Description

This package contains example files accompanying the topdownr.

Details

It has just one function topDownDataPath() that returns the file path to the 5 example protein datasets.

Each dataset has four different categories of files:

In total this package has 341 files: a .fasta file for each protein (5) and 20 files of each of the three method/spectra information files for every protein except for the bovine carbonic anhydrase and C3a recombinant protein which have 26 of each.

The topdownr package needs all the four file types. The sequence information of the .fasta file is used to calculate the fragmentation in-silico. The theoretical fragments are matched against the experimental seen fragments that are stored in the .mzML files. In the next step the fragmentation data have to be combined with the general information about spectra and the fragmentation condition from the .txt scan header and the .experiments.csv method files, respectively.

In combination these information could be used to investigate fragmentation conditions and to find the one (or more) that maximise the overall fragment coverage. Please see a small example on the end of this manual page and a full featured example analysis in the topdownr analysis vignette: vignette("analysis", package="topdownr").

The .meth files were created with the following command:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
library("topdownr")

writeMethodXmls(defaultMs1Settings(LastMass=1600),
                defaultMs2Settings(),
                ## mass/z adapted to protein of interest (see table)
                ## z is currently not supported by the Thermo software,
                ## setting to 1.
                mz=cbind(mass=c(745.2, 908.0, 1162.0), z=c(1, 1, 1)),
                groupBy=c("replication", "ETDReactionTime"),
                replications=2,
                pattern="method_CA3_\%s.xml")

General Information

protein name uniprot accession product number modifications monoisotopic mass observed monoisotopic mass predicted
horse myoglobin P68082 sigma M1882 Met-loss 16940.99 16940.96
bovine carbonic anhydrase P00921 sigma C2522 Met-loss + Acetyl 29006.76 29006.83
histone H3.3 P84243 NEB M2507S Met-loss 15187.49 15187.46
histone H4 P62805 NEB M2504S Met-loss 11229.33 11229.34
C3a recombinant protein P01024 part (672-748) recombinantly expressed carbamidomethyl 9814.9.0 9814.88

All 5 proteins were infused into a Thermo Orbitrap Fusion Lumos at 600 nl/minute in 50 % acetonitrile 0.1 FS360-20-10-5-6.35CT emitter.

M/Z used

protein name m/z 1 m/z 2 m/z 3
horse myoglobin 707.3/24 893.1/19 1211.7/14
bovine carbonic anhydrase 745.2/39 908.0/32 1162.0/25
histone H3.3 563.8/27 691.8/22 894.9/17
histone H4 562.7/20 703.2/16 937.3/12
C3a recombinant protein 745.2/17 908.0/14 1162.0/11

Author(s)

Pavel Shliaha pavels@bmb.sdu.dk, Sebastian Gibb mail@sebastiangibb.de

References

https://github.com/sgibb/topdownrdata/

See Also

topDownDataPath(), topdownr-package,
Vignettes for the generation vignette("data-generation", package="topdownr") and analysis of these data vignette("analysis", package="topdownr").
Website: https://sgibb.github.io/topdownr/

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
# List file categories
list.files(topdownrdata::topDownDataPath("myoglobin"))

# List all needed files
list.files(topdownrdata::topDownDataPath("myoglobin"), recursive=TRUE)

# Read files, predict fragments and combine spectra information
tds <- readTopDownFiles(
    path=topDownDataPath("myoglobin"),
    ## Use an artifical pattern to load just the fasta
    ## file and files from m/z == 1211, ETD reagent
    ## target 1e6 and first replicate to keep runtime
    ## of the example short
    pattern=".*fasta.gz$|1211_.*1e6_1"
)

# Show TopDownSet object
tds

# Filter all intensities that don't have at least 10 % of the highest
# intensity per fragment.
tds <- filterIntensity(tds, threshold=0.1)

# Filter all conditions with a CV above 30 % (across technical replicates)
tds <- filterCv(tds, threshold=30)

# Filter all conditions with a large deviation in injection time
tds <- filterInjectionTime(tds, maxDeviation=log2(3), keepTopN=2)

# Filter all conditions where fragments don't replicate
tds <- filterNonReplicatedFragments(tds)

# Normalise by TIC
tds <- normalize(tds)

# Aggregate technical replicates
tds <- aggregate(tds)

# Coerce to NCBSet (N-/C-terminal/Bidirectional) and plot fragment coverage
fragmentationMap(as(tds, "NCBSet"))

sgibb/topdownrdata documentation built on Aug. 9, 2019, 1:11 a.m.