Description Usage Arguments Details Value Author(s) Examples
This function constructs a global matrix called initial_DATA.txt by collecting and assembling the information from chromatograms and mass spectra from several GC-MS analyses. It performs basic peak detection if the input file is in ASCII format. For other input files, peak retention times (or retention indices) are retrieved from the chromatograms (peaklist.txt or rteres.txt files) and associated to their respective mass spectrum (AIA/ANDI NetCDF, mzXML, mzData and mzML files). Each row of the output matrix represents one peak in one analysis and reports the sample name in first column, the peak retention time (or retention index) in second column and the mass spectrum of the peak in the following columns. If the input file is in Agilent format, two quantification measures of peak size can be extracted directly from rteres.txt: corrected area is then inserted in column 3 and percent of the total corrected area is placed in column 4 of initial_DATA.txt. If the input file is CDF, one or two quantification measures of peak size can be extracted from column 6 (quantification1) and 7 (quantification2) of peaklist.txt; values are then reported respectively in column 3 and 4 of initial_DATA.txt. Except for ASCII, xcms package is needed. Copy paste the following code to download xcms: source("http://bioconductor.org/biocLite.R");biocLite("xcms")
1 2 | MS.DataCreation(DataType="CDF", path="", pathCDF="", mz, N_filt=3, apex= FALSE,
quant = FALSE)
|
DataType |
Indicate the type of input files: CDF (default) when each sample folder contains a mass spectrum in AIA/ANDI NetCDF, mzXML, mzData or mzML format, and a peak list stored in a file named peaklist.txt. Agilent when sample folders are obtained with Agilent Technologies machines (extension .D) and contained a peak list stored in rteres.txt file (all .D folders should be grouped in one folder); mass spectra in AIA/ANDI format are grouped in a separate folder. ASCII for sample folders as returned by trans.ASCII. |
path |
If |
pathCDF |
If |
mz |
Range of mass fragments delimiting the mass spectrum, e.g. 30:250. If |
N_filt |
Only if |
apex |
|
quant |
If |
After a GC-MS analysis, different types of files are produced from the chromatograph and the mass spectrometer . Each instrument vendor provide specific proprietary data formats that should be converted to common raw data format such as ANDI NetCDF or mzXML. Most commonly used file formats for mass spectral data, i.e. NetCDF, mzXML and ASCII, are acceptable in MS.DataCreation. Specific proprietary format from Agilent Technologies can also be used directly. Below the detailed structure of the three types of input formats:
(i) DataType=CDF. Each GC-MS analysis has its own folder, which contains a mass spectrum in AIA/ANDI NetCDF, mzXML, mzData or mzML format, and a peak list stored in a file named peaklist.txt. Peaklist.txt should have column headings similar to
peak/RT/firstscan/maxscan/lastscan/quantification1/quantification2. The first column contain the peak number, the retention time in minute or second is in the second column, the first scan of the peak is in the third column, the scan at the apex (maxscan) is in column 4, the last scan of the peak is in column 5, and optionally a quantitative measure of peak size (quantifaction1) is in column 6, and another quantitative measures of peak size (quantification2) is in column 7 (only maxscan used if apex=TRUE
in MS.clust
). The sample name reported in the output matrix is extracted from the name of the AIA/ANDI files. Thus, all AIA/ANDI files should have different names. All analysis folders should be grouped in one folder.
The function first checks for the presence of AIA/ANDI and peaklist.txt files, controls if the range of mz is consistent and checks the structure of the peaklist.txt files. In a second time, the function collects the peak's retention time in peaklist.txt and looks for corresponding mass spectra in CDF files. Depending on the Apex option, the mean mass spectrum per each peak is calculated or the mass spectrum at the apex is extracted. The intensity, in counts, of each mass fragment is transformed to a relative percentage of the highest mass fragment per spectrum. If quant = TRUE, one or two quantification columns, quantification1 and quantification2, are extracted for each peak from peaklist.txt and placed respectively in columns 3 and 4 of the output initial_DATA matrix.
(ii) DataType=Agilent. For Agilent Technologies providers (using the default parameters): each GC-MS analysis returns a folder .D that contains a file rteres.txt with summary information of the chromatogram (analogous to a peak list). All the analysis folders should have different names and should be grouped in one folder. The mass spectra should be exported in ANDI NetCDF format. These files are automatically generated at once for several selected GC-MS analyses with the Chemstation data analysis software (Menu/File/Export to AIA/ANDI). By default, all CDF files are exported in one folder that may correspond to pathCDF
.
The sample name reported in the output matrix is extracted from the name of the .D folder. Thus, all .D folders should have different names. AIA/ANDI files should have identical name with the corresponding .D folder.
The function first checks if all sample folders (.D) within the folder path have a file rteres.txt and if in pathCDF
there are all the CDF files needed. If one file is missing, the analysis stops and indicates the name of the problematic sample. The analysis should be restarted after correction or removal. In a second time, the function collects the peak's retention time in rteres.txt and looks for corresponding mass spectra in CDF files. Depending on the Apex option, the mean mass spectrum per each peak is calculated or the mass spectrum at the apex is extracted. The intensity, in counts, of each mass fragment is transformed to a relative percentage of the highest mass fragment per spectrum. If quant = TRUE, the two quantification columns CorrArea (corrected peak area) and PercTot (percent of the total corrected area) are extracted for each peak from rteres.txt and placed respectively in columns 3 and 4 of the output initial_DATA matrix.
(iii) DataType=ASCII.If your GC-MS raw data have been converted into the international ASCII format, all files (one per GC-MS analysis) should be grouped in one folder and first pass through the trans.ASCII function. The trans.ASCII function generates a folder output_date_time with translated files compatible with MS.DataCreation. This output_date_time file may correspond to path. First, a smoothing of chromatogram depending on the option N_filt is performed (see the documentation of the function filter, method=convolution). Afterwards, peak are detected by the succession of 3 points with increasing intensity directly followed by three points of decreasing intensity (all points should have an intensity higher than 10 kilocounts). The first and last peaks of the chromatogram are removed if incomplete. In a third time, depending on the Apex option, the function calculates the mean mass spectrum per each peak or extracts the mass spectrum at the apex and the intensity (in counts) of each mass fragment is transformed to a relative percentage of the highest mass fragment per spectrum.
The output file called initial_DATA.txt is saved in a folder called
Output_MSDataCreation_resultdate_time. It contains the relative mass spectrum of each peak of all samples. The first column contains sample name (the name of the folder containing the GC-MS analysis), the second column is the peak retention time (or retention index) and the following columns correspond to the relative mass spectrum of the peak (within the range of the mass spectrum). If quant = TRUE, the first column contains sample name (the name of the folder containing the GC-MS analysis), the second column is the peak retention time (or retention index), the third column contains quantification 1 (corrected area for Agilent), the fourth column contains quantification 2 (percent of the total corrected area for Agilent) and the following columns correspond to the relative mass spectrum of the peak (within the range of the mass spectrum).
MS.DataCreation returns a data matrix called initial_DATA.txt, saved in a folder called
Output_MSDataCreation_resultdate_time. It contains one row per peak and per individual with sample name, retention time (or retention index) and relative mass spectrum. If quant =TRUE, two supplementary columns quantification1 and quantification2 are added after the column retention time. During the analysis, a temporary file called save_list_temp.rda is automatically generated in folder Output_MSDataCreation_resultdate_time. It allows recovering temporary informations if the function stopped before ending.
Elodie Courtois, Yann Guitton, Florence Nicole
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 | ##not run
## DataType="Agilent"
## require xcms package
## For Agilent Technologies GC-MS files
## two folders are required:one folder with all .D analysis folders,
## each containing a rteres.txt file
## the second folder contains all CDF or mzXML files.
## CDF files have to be downloaded from MSeasy web site
## http://sites.google.com/site/rpackagemseasy/downloads/Agilent_example.zip
## Not run:
url1<-"http://sites.google.com/site/rpackagemseasy/downloads/Agilent_example.zip"
download.file(url=url1, destfile="AgilentCDF.zip")
unzip(zipfile="AgilentCDF.zip", exdir=".")
unlink("AgilentCDF.zip") ##delete the zip files
## Two folders are created in your current working directory : Agilent_CDF and Agilent_rteres
#with pathCDF
library(xcms)
MS.DataCreation(path=file.path(getwd(),"Agilent_rteres"), pathCDF=file.path(getwd(),
"Agilent_CDF"), DataType="Agilent", mz=30:250,apex=FALSE, quant=FALSE)
# without pathCDF
library(xcms)
MS.DataCreation(path=file.path(getwd(),"Agilent_rteres"), DataType="Agilent",
mz=30:250,apex=FALSE, quant=FALSE)
## Browse for the path to the Agilent_CDF folder
## downloaded and unzipped from MSeasy website
unlink(c("Agilent_rteres", "Agilent_CDF"), recursive=TRUE) #remove
##DataType="CDF"
##require xcms package
## Each GC-MS files has one folder containing
## one CDF files and one peak list file named peaklist.txt
## All analysis folders are grouped in one folder
## CDF files and peaklist.txt have to be downloaded from MSeasy web site
## http://sites.google.com/site/rpackagemseasy/downloads/CDF_peaklist_example.zip
url1<-"http://sites.google.com/site/rpackagemseasy/downloads/CDF_peaklist_example.zip"
download.file(url=url1, destfile="ExampleCDF.zip")
unzip(zipfile="ExampleCDF.zip", exdir=".")
##One folder is created in your current working directory CDF_peaklist
unlink("ExampleCDF.zip") ##delete the zip files
#with pathCDF
library(xcms)
MS.DataCreation(pathCDF=file.path(getwd(),"CDF_peaklist"),
DataType="CDF", mz="all",apex=FALSE, quant=FALSE)
# without pathCDF
library(xcms)
MS.DataCreation(DataType="CDF", mz="all",apex=FALSE, quant=FALSE)
## Ask for the CDF_peaklist folder
## downloaded and unzipped from MSeasy website
unlink("CDF_peaklist", recursive=TRUE)
## End(Not run)
##For ASCII GC-MS files
pathASCII<-system.file("doc/ASCII_MSDataCreation",
package="MSeasy")
MS.DataCreation(path=pathASCII,mz=30:250,DataType="ASCII",apex=TRUE, N_filt=3)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.